I’m evaluating kdb against an existing J application and was surprised to see how slow an aggregate of multiple columns is - both in memory and hdb
The application is aggregating approximately 51 million rows down to 13 million customer product combinations
The kdb code takes approximately 31 seconds vs 3 seconds with J
Is there a faster way to do the sum in kdb?
/test 1 - using symbols
n: 13000000;
cust: n?`8;
prod: n?`8;
v: n?100
a:(cust:cust; prod:prod ;v:v)
q)\t select sum(v) by cust, prod from a
31058
/test 2 - using strings, very slow
n: 13000000;
cust: string n?`8;
prod: string n?`8;
v: n?100
a:(cust:cust; prod:prod ;v:v)
q)\t select sum(v) by cust, prod from a
116745
comparison J code
n=:13000000
cust=: _8[\ a. {~ (65+?(8*n)#26)
prod=: _8[\ a. {~ (65+?(8*n)#26)
v=: ?.n#100
agg=: 3 : 0
keys=:i.~ |: i.~ every (cust;prod)
c=.((~.keys) { cust)
p=.((~.keys) { prod)
s=.keys +//. v
c;p;s
)
NB. 3.57 seconds
6!:2 ‘r=.agg 0’
3.57139
({.@$) every r
13000000 13000000 13000000