Optimize Kdb group queries ('by' clause) for a speedy execution

vivek_shende · April 27, 2019, 7:34am

Hello,

I need to optimize Kdb group queries to reduce their total execution time. I am executing a group query using ‘by’ clause over 4 columns. The group query itself runs over a subquery.

The subquery returns around 6 million records and it runs in approximately 700 ms. The group query that runs on top of it reduces these records to around 3-4k, but it takes almost 4000 ms (4 seconds) to run which is beyond the acceptable limit of the application. I need to optimize it so that the total query execution time is less than 1.5 seconds. Is there any performant alternative to ‘by’ clause? Is there something I can do with the grouped columns that will improve the execution time?

jwbuitenhuis · April 27, 2019, 11:03am

Have a reproducible example? It’s probably related to the types you’re grouping on. The below example reduces to 12K rows but performs faster:

q)n:6000000;t:(sym:n?1;date:.z.d+n#1 2 3;mkt:n?1;typ:n?`1;price:n?2.0)

q)count select by date,sym,mkt,typ from t

12288

q)\t select by date,sym,mkt,typ from t

261

Topic		Replies	Views
slow aggregate multiple columns - 10x slower than J Community Support kdb-and-q	7	0	March 27, 2015
slow performance of win32 version of KDB Community Support kdb-and-q	7	0	September 15, 2015
kdb+ intro question Community Support kdb-and-q	2	0	June 28, 2014
RE: [personal kdb+] optimize ungroup columns operation in kdb Community Support imported , kdb-and-q	1	1	January 2, 2016
Query Performance Community Support kdb-and-q	3	0	August 22, 2017

Optimize Kdb group queries ('by' clause) for a speedy execution

Related topics