Hi,
I am using a standard splayed format for my trade data where i have directories for each date and each column as separate file in there. I am reading from csv files and storing using the below code. I am using the trial version 32 bit on win 7, 64 bit.
readDat: {[x]tmp: read data from csv file(x)tmp: `sym`time`trdId xasc tmp;/trd: update `g#sym from trd;trade:: trd;.Q.dpft[`:/kdb/ndb; dt; `sym; `trade];.Q.gc[];};\t readDat each 50#dtlist
I have tried both using the `g#sym and without it. Data has typically 1.5MM rows per date. select time for this is from 0.5 to 1 second for a day Is there a way to improve times for either of the below queries.
\t select from trade where date=x\t select from trade where date=x, sym=y
I have read the docs on segmentation, partitioning etc. but not sure if anything would help here.
Regards,
Date,sym,time is the fastest combination in the where clause.
By far the slowest bit of the whole process is getting data off the disk. Experiment with compression.
So the basic setup is optimal. No sorting etc would help?
Also, does compression help get data off the disk faster or is it for space minimization only?
your sort and save look ok - .Q.dpft will put `p# on sym which is good.
how many columns do you have? select only the columns you need.
\t select from trade where date=x
is just mapping all columns from that single date’s partition, and if this is slow, it’s likely either you have many many columns, or a slow disk (high seek time). Can try to mitigate with .Q.MAP which will keep all columns mapped, at the cost of using your address space - not advisable if using compression, and quite limited on 32bit.
Thanks, I’ll try compression and .Q.MAP independently and see if anything helps much.
>??does compression help get data off the disk faster or is it for space minimization only
The idea being, if there’s less to get off the disk it’ll take a shorter time (if you’re no cpu-bound). In reality there’s a trade-off that you’d have to figure out of compression ratio vs additional latency due to decompression
maybe obvious - check that you don’t have other processes issuing requests to that same disk whilst you are trying to select from it…
The compression helped! Thanks v much to you both!