I have 2 very simple query which do essentially same work:
\t select avg x by sym, minute from t where date in dates,sym in syms
\t t1:select from t where date in dates; select avg x by sym, minute from t1 where sym in syms
The table is not really huge, there are about 20K syms with data every minutes within market hours, date, minute, sym, x are the only columns, the HDB is date partition and there is parted attribute at sym column.
syms is about 5K syms and data is across 20 days
The 2nd one run 5 times faster than the 1st one pretty consistently, I think the reason is that first one will do aggregation date by date so that 1st take more IO, but taking 5 times more is a bit surprising to me, will anyone have comment about the performance difference
Could you answer David’s question? I am intrigued by your question. Unless of course you have figured it out?
Threads may very well be having an effect.
If you haven’t figured it out, are all the sym files in each partition parted? The meta will be just showing you the latest..so we should check that too.
Something like below would work
group date!{[x;y]exec attr sym from select sym from x where date=y}[t]each date