However sometimes kdb is slow in basic ops. I met few already, but easiest to nail is following.
I am implementing “high was how many days ago” window function, and as a side effect I reimplemented mmax (called mmax0).
Here are metrics:
/ Correctness:
q)t:`float$2000000?100000
q)mmax[3;t]~mmax0[3]t
1b
q)mmax[1000;t]~mmax0[1000]t
1b
/ Short window:
q)\t mmax[10;t]
122
q)\t mmax0[10;t]
19
/ Large window:
q)\t mmax0[1000;t]
16
q)\t mmax[1000;t]
11450
/ Worst case:
q)tt:desc t
q)\t mmax[1000;tt]
11837
q)\t mmax0[1000;tt]
3263
q)mmax[1000;tt]~mmax0[1000;tt]
1b
Other Edge case:
q)tt:asc t
q)\t mmax[1000;tt]
12253
q)\t mmax0[1000;tt]
13
q)mmax[1000;tt]~mmax0[1000;tt]
1b
In worst case, mine is 6x faster, on real-life it’s orders of magnitude faster.
What I want to say: why not include this implementation it in kdb+ ? ;)
Second question: why “no”? (probably goal is to keep executable size small, I don’t know how big impact does size make in real-life scenarios, would be interesting to hear some general comments from authors)
Also, there’s a chance my 6x speedup is due to AVX512, but general case cannot be attributed to AVX512.
Hard to comment without looking at your mmax0 implementation, mmax which is written in K4 is a more general implementation based on monadic over (for controlling outer iteration window size-1) and |': This is bound to be slower for very large size n and window size compared to a more vectorized solution. Perhaps if Arthur could write it in C I bet it can go much faster.
On Thursday, November 8, 2018 at 12:29:12 AM UTC+2, Ajay Rathore wrote:
Hard to comment without looking at your mmax0 implementation, mmax which is written in K4 is a more general implementation based on monadic over (for controlling outer iteration window size-1) and |': This is bound to be slower for very large size n and window size compared to a more vectorized solution. Perhaps if Arthur could write it in C I bet it can go much faster.
My implementation keeps track of maximum value between iterations.
If during one of the following iterations maximum value “leaves” the window, I make full search in window once.
If bigger value “enters” the window, it becomes tracked.
So the bigger window and the more distinct values are, the more O(N) my mmax0 becomes.
Worst case is same complexity as K one, and 6 times difference is due to composed K primitives opposed to fused ones (gcc case). Thanks for pointing that out, I thought it is native for some reason, forgot to check.
My hat is off to genius who wrote it that way, it’s true programming gem. I think that keeping kdb written this way is one of the priorities they have. However after acknowledging the beauty I personally have to move on into more practical dimension (native reimplementations of important paths).
On Wed, 7 Nov 2018 at 21:44, Sanny Sanoff <sanny…@gmail.com> wrote:
I like fast performance, that’s why I like kdb+.
However sometimes kdb is slow in basic ops. I met few already, but easiest to nail is following.
I am implementing “high was how many days ago” window function, and as a side effect I reimplemented mmax (called mmax0).
Here are metrics:
/ Correctness:
q)t:`float$2000000?100000
q)mmax[3;t]~mmax0[3]t
1b
q)mmax[1000;t]~mmax0[1000]t
1b
/ Short window:
q)\t mmax[10;t]
122
q)\t mmax0[10;t]
19
/ Large window:
q)\t mmax0[1000;t]
16
q)\t mmax[1000;t]
11450
/ Worst case:
q)tt:desc t
q)\t mmax[1000;tt]
11837
q)\t mmax0[1000;tt]
3263
q)mmax[1000;tt]~mmax0[1000;tt]
1b
Other Edge case:
q)tt:asc t
q)\t mmax[1000;tt]
12253
q)\t mmax0[1000;tt]
13
q)mmax[1000;tt]~mmax0[1000;tt]
1b
In worst case, mine is 6x faster, on real-life it’s orders of magnitude faster.
What I want to say: why not include this implementation it in kdb+ ? ;)
Second question: why “no”? (probably goal is to keep executable size small, I don’t know how big impact does size make in real-life scenarios, would be interesting to hear some general comments from authors)
Also, there’s a chance my 6x speedup is due to AVX512, but general case cannot be attributed to AVX512.
Hi Sanny,
Would you mind sharing your implementation? I am also looking at much faster mmax/mmin versions than the ones currently implemented… Many thanks