improving native lj performance

hijibijbij73 · December 28, 2017, 12:38am

I need to update a handful of rows in a large table, where both tables share a unique identifier as key and are sorted.

In below, t is the large table, and u is the small table.

t:update s#id from ([] id:til 10000; foo:a; bar:b; baz:c; quux:`d)

u:id xkey update s#id from ( id:5000 5050; foo:e1e2; bar:f1f2; foobar:g1g2)

First tried using left-join - the performance is about 117ms (for 1000 iterations) on my machine:

\ts:1000 t lj u / 117ms, 394k

Then noticed that update does this much faster

\ts:1000 update foo:e1e2,bar:f1f2,foobar:g1g2 from t where id in (5000;5050) / 15ms,198k

And it’s fairly straightforward to write a generic function that does the update using functional queries:

.my.lj:{[t;u;k]

![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]

}

\ts:1000 .my.lj[t;0!u;`id] / 17ms, 198k

Seems like at least in this restricted case, it’s fairly straightforward to get a 10x improvement on native left-join on what i think is a fairly common use case for left-join.

Am I missing something? Is there a better native way of doing this than lj?

Thanks!

Hiji

quintanar4011 · December 28, 2017, 9:05am

I guess lj is designed for cases where size(t)~size(u). If you try your example for u with 9000 rows for example your fn will be much slower. This may happen sometimes. For example the optimal sym/time or time/sym order in the where clause (with `g# on sym) depends on sym size.

???, 28 ??? 2017 ?., 11:07:49 UTC+3 ??? hijibi...@gmail.com ???:

I need to update a handful of rows in a large table, where both tables share a unique identifier as key and are sorted.

In below, t is the large table, and u is the small table.

t:update s#id from ([] id:til 10000; foo:a; bar:b; baz:c; quux:`d)

u:id xkey update s#id from ( id:5000 5050; foo:e1e2; bar:f1f2; foobar:g1g2)

First tried using left-join - the performance is about 117ms (for 1000 iterations) on my machine:

\ts:1000 t lj u / 117ms, 394k

Then noticed that update does this much faster

\ts:1000 update foo:e1e2,bar:f1f2,foobar:<wbr>g1g2 from t where id in (5000;5050) / 15ms,198k

And it’s fairly straightforward to write a generic function that does the update using functional queries:

.my.lj:{[t;u;k]

![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]

}

\ts:1000 .my.lj[t;0!u;`id] / 17ms, 198k

Seems like at least in this restricted case, it’s fairly straightforward to get a 10x improvement on native left-join on what i think is a fairly common use case for left-join.

Am I missing something? Is there a better native way of doing this than lj?

Thanks!

Hiji

Flying1 · January 3, 2018, 2:48am

The difference in speed comes from the fact that lj has to perform look-up in u for every single row in t, whereas .my.lj only performs that for 2 rows out of the large t.

So I think Andrew pointed out quite correctly that your own “lj” works much better than the default implementation.

pressjonny0 · January 3, 2018, 8:43am

Also, lj is doing a bit more work here to give the correct output. .my.lj is dependent on the order of t and u being the same, really the lj equivalent update should be something like:

update foo:(5000 5050!e1e2)id,bar:(5000 5050!f1f2)id,foobar:(5000 5050!g1g2)id from t where id in (5000;5050)

If you reverse the order of t the results are incorrect:

q)select from .my.lj[t;0!u;`id] where id in 5000 5050

id foo bar baz quux foobar

5000 e1 f1 c d g1

5050 e2 f2 c d g2

q)select from .my.lj[reverse t;0!u;`id] where id in 5000 5050

id foo bar baz quux foobar

5050 e1 f1 c d g1

5000 e2 f2 c d g2

Also if you have a record in u which isnt in t it fails:

q).my.lj[t;0!u,1!enlist idfoobarfoobar!(20000;e3;f3;g3);id]

{[t;u;k] ![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]}

'length

I guess the bottom line is that there will always be certain datasets which can be special cased and beat the native generic q operators, but its got to be worth the dev effort to think through and handle all the cases.

Topic		Views
Merge joins and other performance related questions Community Support kdb-and-q	1	September 28, 2018
aj on a large data set Community Support kdb-and-q	2	June 10, 2019
optimize ej (equijoin) Community Support kdb-and-q	0	March 10, 2016
Large Scale WindowJoins Questions Community Support imported , kdb-and-q	9	December 2, 2022
asof join Community Support kdb-and-q	0	February 8, 2012

improving native lj performance

Related topics