I need to update a handful of rows in a large table, where both tables share a unique identifier as key and are sorted.
In below, t is the large table, and u is the small table.
t:update s#id from ([] id:til 10000; foo:
a; bar:b; baz:
c; quux:`d)
u:id xkey update
s#id from ( id:5000 5050; foo:e1
e2; bar:f1
f2; foobar:g1
g2)
First tried using left-join - the performance is about 117ms (for 1000 iterations) on my machine:
\ts:1000 t lj u / 117ms, 394k
Then noticed that update does this much faster
\ts:1000 update foo:e1
e2,bar:f1
f2,foobar:g1
g2 from t where id in (5000;5050) / 15ms,198k
And it’s fairly straightforward to write a generic function that does the update using functional queries:
.my.lj:{[t;u;k]
![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]
}
\ts:1000 .my.lj[t;0!u;`id] / 17ms, 198k
Seems like at least in this restricted case, it’s fairly straightforward to get a 10x improvement on native left-join on what i think is a fairly common use case for left-join.
Am I missing something? Is there a better native way of doing this than lj?
Thanks!
Hiji
I guess lj is designed for cases where size(t)~size(u). If you try your example for u with 9000 rows for example your fn will be much slower. This may happen sometimes. For example the optimal sym/time or time/sym order in the where clause (with `g# on sym) depends on sym size.
???, 28 ??? 2017 ?., 11:07:49 UTC+3 ??? hijibi...@gmail.com ???:
I need to update a handful of rows in a large table, where both tables share a unique identifier as key and are sorted.
In below, t is the large table, and u is the small table.
t:update s#id from ([] id:til 10000; foo:
a; bar:b; baz:
c; quux:`d)
u:id xkey update
s#id from ( id:5000 5050; foo:e1
e2; bar:f1
f2; foobar:g1
g2)
First tried using left-join - the performance is about 117ms (for 1000 iterations) on my machine:
\ts:1000 t lj u / 117ms, 394k
Then noticed that update does this much faster
\ts:1000 update foo:e1
e2,bar:f1
f2,foobar:<wbr>g1
g2 from t where id in (5000;5050) / 15ms,198k
And it’s fairly straightforward to write a generic function that does the update using functional queries:
.my.lj:{[t;u;k]
![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]
}
\ts:1000 .my.lj[t;0!u;`id] / 17ms, 198k
Seems like at least in this restricted case, it’s fairly straightforward to get a 10x improvement on native left-join on what i think is a fairly common use case for left-join.
Am I missing something? Is there a better native way of doing this than lj?
Thanks!
Hiji
The difference in speed comes from the fact that lj has to perform look-up in u for every single row in t, whereas .my.lj only performs that for 2 rows out of the large t.
So I think Andrew pointed out quite correctly that your own “lj” works much better than the default implementation.
Also, lj is doing a bit more work here to give the correct output. .my.lj is dependent on the order of t and u being the same, really the lj equivalent update should be something like:
update foo:(5000 5050!e1
e2)id,bar:(5000 5050!f1
f2)id,foobar:(5000 5050!g1
g2)id from t where id in (5000;5050)
If you reverse the order of t the results are incorrect:
q)select from .my.lj[t;0!u;`id] where id in 5000 5050
id foo bar baz quux foobar
5000 e1 f1 c d g1
5050 e2 f2 c d g2
q)select from .my.lj[reverse t;0!u;`id] where id in 5000 5050
id foo bar baz quux foobar
5050 e1 f1 c d g1
5000 e2 f2 c d g2
Also if you have a record in u which isnt in t it fails:
q).my.lj[t;0!u,1!enlist id
foobar
foobar!(20000;e3;
f3;g3);
id]
{[t;u;k] ![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]}
'length
I guess the bottom line is that there will always be certain datasets which can be special cased and beat the native generic q operators, but its got to be worth the dev effort to think through and handle all the cases.