improving native lj performance

I need to update a handful of rows in a large table, where both tables share a unique identifier as key and are sorted.

In below, t is the large table, and u is the small table.

t:update s#id from ([] id:til 10000; foo:a; bar:b; baz:c; quux:`d)

u:id xkey update s#id from ( id:5000 5050; foo:e1e2; bar:f1f2; foobar:g1g2)

First tried using left-join - the performance is about 117ms (for 1000 iterations) on my machine:

\ts:1000 t lj u / 117ms, 394k

Then noticed that update does this much faster

\ts:1000 update foo:e1e2,bar:f1f2,foobar:g1g2 from t where id in (5000;5050) / 15ms,198k

And it’s fairly straightforward to write a generic function that does the update using functional queries:

.my.lj:{[t;u;k]

    ![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]

    }

\ts:1000 .my.lj[t;0!u;`id] / 17ms, 198k

Seems like at least in this restricted case, it’s fairly straightforward to get a 10x improvement on native left-join on what i think is a fairly common use case for left-join.

Am I missing something? Is there a better native way of doing this than lj?

Thanks!

Hiji

I guess lj is designed for cases where size(t)~size(u). If you try your example for u with 9000 rows for example your fn will be much slower. This may happen sometimes. For example the optimal sym/time or time/sym order in the where clause (with `g# on sym) depends on sym size.

???, 28 ??? 2017 ?., 11:07:49 UTC+3 ??? hijibi...@gmail.com ???:

I need to update a handful of rows in a large table, where both tables share a unique identifier as key and are sorted.

In below, t is the large table, and u is the small table.

t:update s#id from ([] id:til 10000; foo:a; bar:b; baz:c; quux:`d)

u:id xkey update s#id from ( id:5000 5050; foo:e1e2; bar:f1f2; foobar:g1g2)

First tried using left-join - the performance is about 117ms (for 1000 iterations) on my machine:

\ts:1000 t lj u / 117ms, 394k

Then noticed that update does this much faster

\ts:1000 update foo:e1e2,bar:f1f2,foobar:<wbr>g1g2 from t where id in (5000;5050) / 15ms,198k

And it’s fairly straightforward to write a generic function that does the update using functional queries:

.my.lj:{[t;u;k]

    ![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]

    }

\ts:1000 .my.lj[t;0!u;`id] / 17ms, 198k

Seems like at least in this restricted case, it’s fairly straightforward to get a 10x improvement on native left-join on what i think is a fairly common use case for left-join.

Am I missing something? Is there a better native way of doing this than lj?

Thanks!

Hiji

The difference in speed comes from the fact that lj has to perform look-up in u for every single row in t, whereas .my.lj only performs that for 2 rows out of the large t.

So I think Andrew pointed out quite correctly that your own “lj” works much better than the default implementation.

Also, lj is doing a bit more work here to give the correct output.  .my.lj is dependent on the order of t and u being the same, really the lj equivalent update should be something like:

update foo:(5000 5050!e1e2)id,bar:(5000 5050!f1f2)id,foobar:(5000 5050!g1g2)id from t where id in (5000;5050)

If you reverse the order of t the results are incorrect:

q)select from  .my.lj[t;0!u;`id] where id in 5000 5050                                                                                                    

id   foo bar baz quux foobar


5000 e1  f1  c   d    g1    

5050 e2  f2  c   d    g2    

q)select from  .my.lj[reverse t;0!u;`id] where id in 5000 5050                                                                                            

id   foo bar baz quux foobar


5050 e1  f1  c   d    g1    

5000 e2  f2  c   d    g2

Also if you have a record in u which isn’t in t it fails:

q).my.lj[t;0!u,1!enlist idfoobarfoobar!(20000;e3;f3;g3);id]                                                                                       

{[t;u;k]  ![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]}

'length

I guess the bottom line is that there will always be certain datasets which can be special cased and beat the native generic q operators, but it’s got to be worth the dev effort to think through and handle all the cases.