Re: [personal kdb+] Read and save big csv file speed

at a quick glance I would say it is due to missing a trailing /

.Q.fs[{`:newfile/ 

/ indicates a splayed table on disk.

 

.Q.fs[{`:newfile/ 

Hm, adding / gives me 'type  error. I guess I have to create this splayed table first?

also need to enumerate the symbols

.Q.fs[{:newfile/ upsert .Q.en[:.;]flip symbolsystemtypemomentidactionpricevolumeid_dealprice_dealownaccount!(“SSSSSIFISFIS”;“,”)0:x}]`:c:/work/orderlog.txt

fsn might have an impact too with a bigger n.

also consider writing to a different drive to that which you are reading from (unless you have ssd).

Thank you for answers.

Also I guess I better not read timestamps as string.

 

Took me 8 minutes to save 1.09 GB file like this. I wonder if it is an acceptable time for kdb. Same file took 3 minutes for HDF5.

You could naturally parallelize this if 8 min is not fast enough.

How fast can you query HDF5?
Did you save it down compressed?? Maybe your disks are slow so it would actually help with performance.

Cheers,
? Attila

Yes, we havent come to reading data yet, we are comparing time needed to store it so far.

 

No compression was enabled, I don’t understand how to do it yet.

The way you save it down now wont have any indices for fast access. As it is a log I assume you could at least put a sorted on time. It all depends on your queries

Compression is as easy as changing .z.zd

Cheers,
? Attila

I expect part of the slowdown is that each time .Q.en is called it locks :sym, reads :sym, updates sym in mem, then saves to :sym again, unlocks :sym. If you have only one process updating :sym then this could be done much more efficiently by enumerating in mem once and then writing :sym. The larger that you make n in .Q.fsn the fewer calls to .Q.en.

i.e.

q):sym?symbol$(); / ensure sym exists

q)sym:get`:sym / load it into memory

q)k).Q.en2:{f@:&11h=@:'x f:!+x;@[x;f;`sym?]} / define .Q.en2 to work with sym in mem

q)t:(s:ab`c) / define a test table

q)0N!.Q.en2[t] / enumerate it

+(,s)!,sym$ab`c

s

a

b

c

q)sym / in mem sym has been updated

ab`c

q)`:sym set sym / save to disk

`:sym

hence your code then becomes

q):sym?symbol$();sym:get`:sym

q)k).Q.en2:{f@:&11h=@:'x f:!+x;@[x;f;`sym?]}

q).Q.fs[{:newfile/ upsert .Q.en2 flip symbolsystemtypemomentidactionpricevolumeid_dealprice_dealownaccount!("SSSSSIFISFIS";",")0:x}]:c:/work/orderlog.txt

q)`:sym set sym

n.b. this is safe if only one process is updating `:sym. If you have multiple processes, you’d need another mechanism.

on 32bit kdb+ you can easily run out of address space with file compression.
And if your disks can read faster than 300MB/s, using file compression will likely slow things down.

Thank you, I will try that.