Hello everybody,
Sometimes I’m dealing with large tickerplant log files, over 4-5GB, so I cannot read them in memory completely, using free version of kdb. Can I read a large log file in, say, two parts? Read first part - process it - call .Q.gc - read second part - process it.
What workarounds do you use in such sitations? Probably we can use a counter inside upd function, e.g.:
/ To read large log file x in chunks of size 100000, call process after reading each chunk
cnt:0;
upd:{[t;d] cnt+::1; if[cnt > 100000; process; .Q.gc; cnt::0; ]; t insert d; }
-11!x
Or may be there is a better solution? Please share your thoughts.
Best regards,
Yury Smykalov
Hi Yury
The AquaQ TorQ framework
http://code.kx.com/wsvn/code/contrib/aquaqanalytics/AquaQTorQ
has a tickerplant log script
http://code.kx.com/wsvn/code/contrib/aquaqanalytics/AquaQTorQ/src/code/processes/tickerlogreplay.q
This handles all that out of the box and a bunch of other stuff - skipping messages, replaying data in chunks, manipulation before save down, manipulation after replay is complete etc. The TorQ document combined with the notes and usage information at the top of the log replay script should explain how it works.
Thanks
Jonny
Hi Jonny!
Thanks for the information, this script looks like what I was looking for!
Best regards,
Yury Smykalov
2014-10-31 0:07 GMT+03:00 Jonny Press <pressjonny0@gmail.com>:
Hi Yury
The AquaQ TorQ framework
http://code.kx.com/wsvn/code/contrib/aquaqanalytics/AquaQTorQ
has a tickerplant log script
http://code.kx.com/wsvn/code/contrib/aquaqanalytics/AquaQTorQ/src/code/processes/tickerlogreplay.q
This handles all that out of the box and a bunch of other stuff - skipping messages, replaying data in chunks, manipulation before save down, manipulation after replay is complete etc. The TorQ document combined with the notes and usage information at the top of the log replay script should explain how it works.
Thanks
Jonny