Hi all,
I’m planning on using 32-bit kdb+/q for a scientific application, namely reading, storing, and manipulating data from quartz crystal microbalance (QCM) experiments. I’m running the m32 and l32 distributions and am not planning on moving to w32.
Our current QCM controller writes its data to a .tsv log with 14 columns where only 3 columns are relevant (i.e. time, crystal frequency, and temperature).*
At the moment my code reads the data (~50-200 million rows) into memory and I do my analysis there. This will not be possible on the 32 bit distribution for longer experiments that routinely cause 'wsfull errors during analysis, and future setups will have 100x or more rows. Robustness is desired here and not loading my data into memory will allow the code to be run on what will likely be lower-end PCs.
I’ve read the “kdb+ for Mortals” and “kx/Cookbook” entries on .Q.fs and related partitioned table functions to avoid being spoon-fed, but I am unsure about its performance.
Why is .Q.fs so slow when writing to disk compared to just reading with 0: for my (currently small) data? Is it because I’m reading from and writing to the same drive? How might I overcome this to achieve better performance for reading these data files?
q) colnames:
times1l
s1ms2l
s2mfreq
s2fs1r
s2rs1th
s2thtemp
T2foo;q) \t .Q.fs[flip colnames!("sssssfffffffff";"\t")0:x}]
$“qcm/sample.tsv"1338q) \t .Q.fs[0N!flip colnames!(“sssssfffffffff”;”\t")0:x}]$"qcm/sample.tsv"2862q) \t .Q.fs[{
:t upsert flip colnames!(“sssssfffffffff”;“\t”)0:x}]$"qcm/sample.tsv"74494q) \t ("sssssfffffffff";"\t")0:
$"qcm/sample.tsv"1314`
Thank you in advance for reading. Any clarification or help is appreciated.
Here’s a sample of the format of the .tsv files for reference, the preamble and column headings are dropped with tail.**
~$ tail -n +3 qcm/sample.tsv3:11:46 PM None None None None 5986009.536 -3 0 0 0 0 121.1 03:11:46 PM None None None None 5986009.536 -3 0 0 0 0 121.1 03:11:46 PM None None None None 5986009.536 -3 0 0 0 0 121.1 03:11:46 PM None None None None 5986009.517 -3 0.08 0 0.00017 0 121.1 03:11:46 PM None None None None 5986009.508 -3 0.04 0 0 0 121.1 03:11:46 PM None None None None 5986009.503 -3 0.02 0 0 0 121.1 0
*There is no easy way to change this, and the plan is to develop a new QCM controller that will feed into kdb+tick to write directly to partitioned tables, so I would rather not attempt to clean up the data beforehand. For the moment, we will have to read in all 14 columns.
**The headers are not easy to call in q, so I just replace them with my own column name list later on.