How can I obtain the row count per partition in a partitioned or segmented table? How much data is read in the query? Does kdb go thru each directory, pick a fixed width column, and divide the file size by the column width?
Hi,
You can use .Q.pn which is populated using the the Q.cn operator.
http://code.kx.com/wiki/DotQ/DotQDotpn
It counts the last column in each partition for the table you have specified, and then .Q.pn holds these results in a dictionary format.
You can also run
select count i>0 by date from tab where date in [list_of_dates]
HTH,
Sean
If your data is partitioned by date, you can always just count by date.
select count i by date from table where date within startDate endDate
If you wanted to do it manually (for some reason), you can use .Q.par to get location of a partition based on the date, load it in memory and then do a count.
Hi Himanshu and Yan,
Note I wrote “count i>0”. I did this on purpose and a colleague recently found this.
If you don’t aggr the i, the “select count i by date where date within startDate endDate”, still scans all partitions, but only outputs the results from that which your where statement satisfies.
Comes from this line:
"Thus, a constraint on i alone would apply across all partitions "
If you have a huge number of partitions, you can greatly speed up your query by putting in count i>0 if you are only wanting counts for specific partitions.
Thanks,
Sean
thx. it worked ^_^