Best Practice to persist real time & high vol Data in KDB

hi,I am new to KDB.I am writing an application to record the real time tick data into KDBfor analysis later.I come across several method to persist data to disc in KDB+:1. log file>> start KDB with -l option and flush the \l2. Manually use save / load function.Since there are so many transactions / data per sec, which is the bestpractice?Any conflict if I use both method? will I duplicate the data in disc?Thanks,Alex

use splayed table and upsert would be an option.

https://code.kx.com/trac/wiki/Cookbook/SplayedTables

q)t:(c0:$();c1:int$())
q):mydata/t/ set .Q.en[:mydata] t
:mydata/t/ q) q):mydata/t/ upsert (:mydata/sym?a;1)
:mydata/t/ q):mydata/t/ upsert (:mydata/sym?b;2)
`:mydata/t/
q)\

restart q

q)\l mydata
q)t
c0 c1

a? 1
b? 2

Regards,

Junan

if you want to be able to query the table while you are loading, you can load it first.

q)t:(c0:$();c1:int$())
q):mydata/t/ set .Q.en[:mydata] t
:mydata/t/ q)\l mydata q):t/ upsert (:sym?a;1)
:t/ q):t/ upsert (:sym?b;2)
`:t/
q)t
c0 c1

a? 1
b? 2
q)\

restart q,

q)\l mydata
q)t
c0 c1

a? 1
b? 2

Nice thing about this approach is that you can use the splayed table to build your HDB.

https://code.kx.com/trac/wiki/KdbplusForMortals/partitioned_tables

Junan

don’t waste your time with upserts. it is slow like hell if it  hits the disk.
use a log for transactions (the automatic one might work in certain situations)

or buy kdb+tick which was buit for that purpose.

felix

Where does the log file go? Is it on disk? Kdb+tick holds all data in memory, it is fast, no Doubt about it. 

But what if you don’t have enough memory? It happens.

My experience with upsert shows the

The performance is impressive, it is even better if you fusion io card.

Regards

Junan 

Sent from my iPhone

log to the disk the update messages. you don’t need to store it all in memo=
ry.
you might need to split the log to able to replay it when you need an optim=
ised
form of storage.

what i’m saying is be creative with kdb and twist it to suit your needs.

here a short example about manual transaction logging:

/ create a q binary file. you need to do that before open it
q):log set () :log
/ open the log
q)l:hopen `:log

/ just for fun, make our upd to act as insert
q)upd:insert
/ the table
q)t:(time:time$();sym:symbol$();price:`int$())

/ append to log
q)l enlist (upd;t;(.z.t;a;12)) 3 / close it and replay the log q)hclose l q)-11!:log
1

felix

2011/1/14 Junan Duan <junan.duan>:
> Where does the log file go? Is it on disk? Kdb+tick holds all data in
> memory, it is fast, no Doubt about it.
> But what if you don’t have enough memory? It happens.
> My experience with upsert shows the
> The performance is impressive, it is even better if you fusion io card.
> Regards
> Junan
>
> Sent from my iPhone
> On Jan 14, 2011, at 3:08 AM, Felix Lungu <felix.lungu> wrote:
>
> don’t waste your time with upserts. it is slow like hell if it =A0hits th=
e
> disk.
> use a log for transactions (the automatic one might work in certain
> situations)
> or buy kdb+tick which was buit for that purpose.
> felix
> On 14 Jan 2011, at 05:11, junan duan wrote:
>
> if you want to be able to query the table while you are loading, you can
> load it first.
>
> q)t:(c0:$();c1:int$())
> q):mydata/t/ set .Q.en[:mydata] t
> :mydata/t/<br>&gt; q)\l mydata<br>&gt; q):t/ upsert (:sym?a;1)
> :t/<br>&gt; q):t/ upsert (:sym?b;2)
> :t/<br>&gt; q)t<br>&gt; c0 c1<br>&gt; -----<br>&gt; a=A0 1<br>&gt; b=A0 2<br>&gt; q)\\<br>&gt;<br>&gt; restart q,<br>&gt;<br>&gt; q)\l mydata<br>&gt; q)t<br>&gt; c0 c1<br>&gt; -----<br>&gt; a=A0 1<br>&gt; b=A0 2<br>&gt;<br>&gt;<br>&gt; Nice thing about this approach is that you can use the splayed table to<br>&gt; build your HDB.<br>&gt;<br>&gt; https://code.kx.com/trac/wiki/KdbplusForMortals/partitioned_tables<br>&gt;<br>&gt;<br>&gt; Junan<br>&gt;<br>&gt;<br>&gt; On Thu, Jan 13, 2011 at 7:01 PM, junan duan <junan.duan> wrote:<br>&gt;&gt;<br>&gt;&gt;<br>&gt;&gt; use splayed table and upsert would be an option.<br>&gt;&gt;<br>&gt;&gt; https://code.kx.com/trac/wiki/Cookbook/SplayedTables<br>&gt;&gt;<br>&gt;&gt;<br>&gt;&gt; q)t:([]c0:$();c1:int$())<br>&gt;&gt; q):mydata/t/ set .Q.en[:mydata] t<br>&gt;&gt; :mydata/t/
>> q)
>> q):mydata/t/ upsert (:mydata/sym?a;1)<br>&gt;&gt; :mydata/t/
>> q):mydata/t/ upsert (:mydata/sym?b;2)<br>&gt;&gt; :mydata/t/
>> q)\
>>
>> restart q
>>
>> q)\l mydata
>> q)t
>> c0 c1
>> -----
>> a=A0 1
>> b=A0 2
>>
>>
>> Regards,
>>
>> Junan
>>
>>
>> On Thu, Jan 13, 2011 at 9:21 AM, Alex wrote:
>>>
>>> hi,
>>>
>>> I am new to KDB.
>>>
>>> I am writing an application to record the real time tick data into KDB
>>> for analysis later.
>>>
>>> I come across several method to persist data to disc in KDB+:
>>> 1. log file
>>> >> start KDB with -l option and flush the \l
>>>
>>> 2. Manually use save / load function.
>>>
>>> Since there are so many transactions / data per sec, which is the best
>>> practice?
>>> Any conflict if I use both method? will I duplicate the data in disc?
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>>
>>> –
>>> You received this message because you are subscribed to the Google Grou=
ps
>>> “Kdb+ Personal Developers” group.
>>> To post to this group, send email to personal-kdbplus@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> personal-kdbplus+unsubscribe@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/personal-kdbplus?hl=3Den.
>>>
>>
>
>
> –
>

Submitted via Google Groups</junan.duan></felix.lungu></junan.duan>

?
replay the log is time consuming when you have high volume data. I personally prefer
splayed table because it is more straight forward,? you can query your data without reload
everything into memory all at once.
?
actually you don’t have to do the splaying in real time. you can keep you real time data
in memory and do the splaying at the end of day, provide you have enough memory.

you can even do block compression on your splayed files if you have newer version of q.
so it takes little space, but gives you reasonably good performance.? ?

An example of how to persist high volume real time data.
http://kx.com/q/d/taq.htm

Junan

On Jan 15, 6:59?pm, junan duan <junan.d…> wrote:> actually you don’t have to do the splaying in real time. you can keep you> real time data> in memory and do the splaying at the end of day, provide you have enough> memory.is this how kdb+tick actually works? I read in the interview of ArthurWhitney that this is the approach used"All day long all the hot stuff is in memory, and thenduring the day it takes about two minutes to write thewhole thing down to disk and then flip to a new day andstart from scratch."</junan.d…>

Yes. See?http://kx.com/Products/kdb+tick.php