Question about the performance difference of two queries on HDB

carfield11 · January 22, 2016, 4:13pm

HI all

I have 2 very simple query which do essentially same work:

\t select avg x by sym, minute from t where date in dates,sym in syms
\t t1:select from t where date in dates; select avg x by sym, minute from t1 where sym in syms

The table is not really huge, there are about 20K syms with data every minutes within market hours, date, minute, sym, x are the only columns, the HDB is date partition and there is parted attribute at sym column.

syms is about 5K syms and data is across 20 days

The 2nd one run 5 times faster than the 1st one pretty consistently, I think the reason is that first one will do aggregation date by date so that 1st take more IO, but taking 5 times more is a bit surprising to me, will anyone have comment about the performance difference

Thanks

effbiae · January 22, 2016, 11:31pm

does changing the order of the where conditions help?

\t select avg x by sym, minute from t where sym in syms,date in dates

carfield1 · January 23, 2016, 12:44am

It thought it will loop through all date partitions? which is over few years…

david_demner · January 23, 2016, 12:54am

To: personal-kdbplus@googlegroups.com

Are you running kdb+ with slaves?

From: Carfield Yim

Sent: Friday, January 22, 2016 16:44

To: personal-kdbplus@googlegroups.com

Reply To: personal-kdbplus@googlegroups.com

Subject: Re: [personal kdb+] Question about the performance difference of two queries on HDB

It thought it will loop through all date partitions? which is over few years....

sohagan · January 27, 2016, 4:13am

Hey Carfield,

Could you answer David’s question? I am intrigued by your question. Unless of course you have figured it out?

Threads may very well be having an effect.

If you haven’t figured it out, are all the sym files in each partition parted? The meta will be just showing you the latest..so we should check that too.

Something like below would work

group date!{[x;y]exec attr sym from select sym from x where date=y}[t]each date

Cheers,

Sean

carfield11 · January 28, 2016, 1:33am

Sorry, missed this reply, no, it is just a normal KDB HDB process

carfield11 · January 28, 2016, 1:34am

Hi Sean, thanks, there is no threading involved, it is just a normal single threaded HDB process

sohagan · January 28, 2016, 2:33am

Thanks Carfied.

So we can tick threading off the list.

Lets continue to narrow down your issue.

Are there any differences in the attributes of sym across partitions?

What does count of ‘group date!{[x;y]exec attr sym from select sym from x where date=y}[t]each date’ give you?

What version of kdb+ are you running? (.z.k;.z.K)

Could you perhaps copy and paste your output from running the query multiple times?

\ts:10 select avg x by sym, minute from t where date in dates,sym in syms

???
\ts:10 t1:select from t where date in dates; select avg x by sym, minute from t1 where sym in syms
???

Topic		Views
Query Optimization Community Support kdb-and-q	6	November 9, 2015
Partition HDB by date vs sym Community Support kdb-and-q	4	June 21, 2019
slow performance of win32 version of KDB Community Support kdb-and-q	7	September 15, 2015
for hdb query, why is it fast when including where date>$date$ Community Support kdb-and-q	1	June 17, 2022
Query performance varies in same kdb session Community Support kdb-and-q	3	May 2, 2019

Question about the performance difference of two queries on HDB

Related topics