mmax performance compared to native implementation (orders of

sannysanoff1 · November 7, 2018, 8:40pm

I like fast performance, that’s why I like kdb+.

However sometimes kdb is slow in basic ops. I met few already, but easiest to nail is following.

I am implementing “high was how many days ago” window function, and as a side effect I reimplemented mmax (called mmax0).

Here are metrics:

/ Correctness:

q)t:`float$2000000?100000

q)mmax[3;t]~mmax0[3]t

1b

q)mmax[1000;t]~mmax0[1000]t

1b

/ Short window:

q)\t mmax[10;t]

122

q)\t mmax0[10;t]

19

/ Large window:

q)\t mmax0[1000;t]

16

q)\t mmax[1000;t]

11450

/ Worst case:

q)tt:desc t

q)\t mmax[1000;tt]

11837

q)\t mmax0[1000;tt]

3263

q)mmax[1000;tt]~mmax0[1000;tt]

1b

Other Edge case:

q)tt:asc t

q)\t mmax[1000;tt]

12253

q)\t mmax0[1000;tt]

13

q)mmax[1000;tt]~mmax0[1000;tt]

1b

In worst case, mine is 6x faster, on real-life it’s orders of magnitude faster.

What I want to say: why not include this implementation it in kdb+ ? ;)

Second question: why “no”? (probably goal is to keep executable size small, I don’t know how big impact does size make in real-life scenarios, would be interesting to hear some general comments from authors)

Also, there’s a chance my 6x speedup is due to AVX512, but general case cannot be attributed to AVX512.

I hope for some discussion here.

Regards,

S.S.

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 15.0px; font: 13.0px Arial; color: #232323; -webkit-text-stroke: #232323; min-height: 15.0px} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 15.0px; font: 13.0px Arial; color: #232323; -webkit-text-stroke: #232323} span.s1 {font-kerning: none}

ajay1 · November 7, 2018, 10:24pm

Hard to comment without looking at your mmax0 implementation, mmax which is written in K4 is a more general implementation based on monadic over (for controlling outer iteration window size-1) and |': This is bound to be slower for very large size n and window size compared to a more vectorized solution. Perhaps if Arthur could write it in C I bet it can go much faster.

sannysanoff1 · November 8, 2018, 9:52am

On Thursday, November 8, 2018 at 12:29:12 AM UTC+2, Ajay Rathore wrote:

Hard to comment without looking at your mmax0 implementation, mmax which is written in K4 is a more general implementation based on monadic over (for controlling outer iteration window size-1) and |': This is bound to be slower for very large size n and window size compared to a more vectorized solution. Perhaps if Arthur could write it in C I bet it can go much faster.

My implementation keeps track of maximum value between iterations.

If during one of the following iterations maximum value “leaves” the window, I make full search in window once.

If bigger value “enters” the window, it becomes tracked.

So the bigger window and the more distinct values are, the more O(N) my mmax0 becomes.

Worst case is same complexity as K one, and 6 times difference is due to composed K primitives opposed to fused ones (gcc case). Thanks for pointing that out, I thought it is native for some reason, forgot to check.

My hat is off to genius who wrote it that way, it’s true programming gem. I think that keeping kdb written this way is one of the priorities they have. However after acknowledging the beauty I personally have to move on into more practical dimension (native reimplementations of important paths).

On Wed, 7 Nov 2018 at 21:44, Sanny Sanoff <sanny…@gmail.com> wrote:

I like fast performance, that’s why I like kdb+.

However sometimes kdb is slow in basic ops. I met few already, but easiest to nail is following.

I am implementing “high was how many days ago” window function, and as a side effect I reimplemented mmax (called mmax0).

Here are metrics:

/ Correctness:

q)t:`float$2000000?100000

q)mmax[3;t]~mmax0[3]t

1b

q)mmax[1000;t]~mmax0[1000]t

1b

/ Short window:

q)\t mmax[10;t]

122

q)\t mmax0[10;t]

19

/ Large window:

q)\t mmax0[1000;t]

16

q)\t mmax[1000;t]

11450

/ Worst case:

q)tt:desc t

q)\t mmax[1000;tt]

11837

q)\t mmax0[1000;tt]

3263

q)mmax[1000;tt]~mmax0[1000;tt]

1b

Other Edge case:

q)tt:asc t

q)\t mmax[1000;tt]

12253

q)\t mmax0[1000;tt]

13

q)mmax[1000;tt]~mmax0[1000;tt]

1b

In worst case, mine is 6x faster, on real-life it’s orders of magnitude faster.

What I want to say: why not include this implementation it in kdb+ ? ;)

Second question: why “no”? (probably goal is to keep executable size small, I don’t know how big impact does size make in real-life scenarios, would be interesting to hear some general comments from authors)

Also, there’s a chance my 6x speedup is due to AVX512, but general case cannot be attributed to AVX512.

I hope for some discussion here.

Regards,

S.S.

–
Submitted via Google Groups

JP14 · June 24, 2019, 2:29am

Hi Sanny,
Would you mind sharing your implementation? I am also looking at much faster mmax/mmin versions than the ones currently implemented… Many thanks

JP

sannysanoff1 · June 29, 2019, 6:15pm

Please have fun.

https://pastebin.com/nyEVxhHp

sannysanoff1 · June 29, 2019, 6:34pm

also https://pastebin.com/mKzHbsAY

JP14 · July 1, 2019, 1:53am

Thx:-)
Cheers

JP

Topic		Replies	Views
slow performance of win32 version of KDB Community Support kdb-and-q	7	0	September 15, 2015
Inconsistent run-time performance Community Support kdb-and-q	2	0	January 22, 2017
kdb+ 4.0 Community Support kdb-and-q	0	0	March 23, 2020
Find (?) performance Community Support kdb-and-q	0	0	October 14, 2018
Sliding window max/min Community Support kdb-and-q	3	1	June 5, 2015

mmax performance compared to native implementation (orders of

Related topics