Best Practice to persist real time & high vol Data in KDB

Alex31 · January 13, 2011, 3:21pm

hi,I am new to KDB.I am writing an application to record the real time tick data into KDBfor analysis later.I come across several method to persist data to disc in KDB+:1. log file>> start KDB with -l option and flush the \l2. Manually use save / load function.Since there are so many transactions / data per sec, which is the bestpractice?Any conflict if I use both method? will I duplicate the data in disc?Thanks,Alex

junan_duan · January 14, 2011, 1:01am

use splayed table and upsert would be an option.

https://code.kx.com/trac/wiki/Cookbook/SplayedTables

q)t:(c0:$();c1:int$())
q):mydata/t/ set .Q.en[:mydata] t
:mydata/t/ q) q):mydata/t/ upsert (:mydata/sym?a;1)
:mydata/t/ q):mydata/t/ upsert (:mydata/sym?b;2)
`:mydata/t/
q)\

restart q

q)\l mydata
q)t
c0 c1

a? 1
b? 2

Regards,

Junan

junan_duan · January 14, 2011, 3:11am

if you want to be able to query the table while you are loading, you can load it first.

q)t:(c0:`$();c1:`int$())
q)`:mydata/t/ set .Q.en[`:mydata] t
`:mydata/t/ q)\l mydata q)`:t/ upsert (`:sym?`a;1)
`:t/ q)`:t/ upsert (`:sym?`b;2)
`:t/
q)t
c0 c1

a? 1
b? 2
q)\

restart q,

q)\l mydata
q)t
c0 c1

a? 1
b? 2

Nice thing about this approach is that you can use the splayed table to build your HDB.

https://code.kx.com/trac/wiki/KdbplusForMortals/partitioned_tables

Junan

felix1 · January 14, 2011, 9:08am

don’t waste your time with upserts. it is slow like hell if it hits the disk.
use a log for transactions (the automatic one might work in certain situations)

or buy kdb+tick which was buit for that purpose.

felix

junan_duan · January 14, 2011, 1:15pm

Where does the log file go? Is it on disk? Kdb+tick holds all data in memory, it is fast, no Doubt about it.

But what if you don’t have enough memory? It happens.

My experience with upsert shows the

The performance is impressive, it is even better if you fusion io card.

Regards

Junan

Sent from my iPhone

felix1 · January 15, 2011, 9:31am

log to the disk the update messages. you don’t need to store it all in memo=
ry.
you might need to split the log to able to replay it when you need an optim=
ised
form of storage.

what i’m saying is be creative with kdb and twist it to suit your needs.

here a short example about manual transaction logging:

/ create a q binary file. you need to do that before open it
q):log set () :log
/ open the log
q)l:hopen `:log

/ just for fun, make our upd to act as insert
q)upd:insert
/ the table
q)t:(time:time$();sym:symbol$();price:`int$())

/ append to log
q)l enlist (upd;t;(.z.t;a;12)) 3 / close it and replay the log q)hclose l q)-11!:log
1

felix

2011/1/14 Junan Duan <junan.duan>:
> Where does the log file go? Is it on disk? Kdb+tick holds all data in
> memory, it is fast, no Doubt about it.
> But what if you don’t have enough memory? It happens.
> My experience with upsert shows the
> The performance is impressive, it is even better if you fusion io card.
> Regards
> Junan
>
> Sent from my iPhone
> On Jan 14, 2011, at 3:08 AM, Felix Lungu <felix.lungu> wrote:
>
> don’t waste your time with upserts. it is slow like hell if it =A0hits th=
e
> disk.
> use a log for transactions (the automatic one might work in certain
> situations)
> or buy kdb+tick which was buit for that purpose.
> felix
> On 14 Jan 2011, at 05:11, junan duan wrote:
>
> if you want to be able to query the table while you are loading, you can
> load it first.
>
> q)t:(c0:$();c1:int$())
> q):mydata/t/ set .Q.en[:mydata] t
> :mydata/t/ > q)\l mydata > q):t/ upsert (:sym?a;1)
> :t/ > q):t/ upsert (:sym?b;2)
> :t/ > q)t > c0 c1 > ----- > a=A0 1 > b=A0 2 > q)\\ > > restart q, > > q)\l mydata > q)t > c0 c1 > ----- > a=A0 1 > b=A0 2 > > > Nice thing about this approach is that you can use the splayed table to > build your HDB. > > https://code.kx.com/trac/wiki/KdbplusForMortals/partitioned_tables > > > Junan > > > On Thu, Jan 13, 2011 at 7:01 PM, junan duan <junan.duan> wrote: >> >> >> use splayed table and upsert would be an option. >> >> https://code.kx.com/trac/wiki/Cookbook/SplayedTables >> >> >> q)t:([]c0:$();c1:int$()) >> q):mydata/t/ set .Q.en[:mydata] t >> :mydata/t/
>> q)
>> q):mydata/t/ upsert (:mydata/sym?a;1) >> :mydata/t/
>> q):mydata/t/ upsert (:mydata/sym?b;2) >> :mydata/t/
>> q)\
>>
>> restart q
>>
>> q)\l mydata
>> q)t
>> c0 c1
>> -----
>> a=A0 1
>> b=A0 2
>>
>>
>> Regards,
>>
>> Junan
>>
>>
>> On Thu, Jan 13, 2011 at 9:21 AM, Alex wrote:
>>>
>>> hi,
>>>
>>> I am new to KDB.
>>>
>>> I am writing an application to record the real time tick data into KDB
>>> for analysis later.
>>>
>>> I come across several method to persist data to disc in KDB+:
>>> 1. log file
>>> >> start KDB with -l option and flush the \l
>>>
>>> 2. Manually use save / load function.
>>>
>>> Since there are so many transactions / data per sec, which is the best
>>> practice?
>>> Any conflict if I use both method? will I duplicate the data in disc?
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>>
>>> –
>>> You received this message because you are subscribed to the Google Grou=
ps
>>> “Kdb+ Personal Developers” group.
>>> To post to this group, send email to personal-kdbplus@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> personal-kdbplus+unsubscribe@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/personal-kdbplus?hl=3Den.
>>>
>>
>
>
> –
>

Submitted via Google Groups</junan.duan></felix.lungu></junan.duan>

junan_duan · January 15, 2011, 6:59pm

?
replay the log is time consuming when you have high volume data. I personally prefer
splayed table because it is more straight forward,? you can query your data without reload
everything into memory all at once.
?
actually you don’t have to do the splaying in real time. you can keep you real time data
in memory and do the splaying at the end of day, provide you have enough memory.

you can even do block compression on your splayed files if you have newer version of q.
so it takes little space, but gives you reasonably good performance.? ?

An example of how to persist high volume real time data.
http://kx.com/q/d/taq.htm

Junan

K4_Monk · January 20, 2011, 2:13pm

On Jan 15, 6:59?pm, junan duan <junan.d…> wrote:> actually you don’t have to do the splaying in real time. you can keep you> real time data> in memory and do the splaying at the end of day, provide you have enough> memory.is this how kdb+tick actually works? I read in the interview of ArthurWhitney that this is the approach used"All day long all the hot stuff is in memory, and thenduring the day it takes about two minutes to write thewhole thing down to disk and then flip to a new day andstart from scratch."</junan.d…>

f11 · January 20, 2011, 11:33pm

Yes. See?http://kx.com/Products/kdb+tick.php

Topic		Replies	Views
Realtime data persistence Community Support kdb-and-q	8	8	March 8, 2018
persistence Community Support kdb-and-q	4	5	December 2, 2010
best practice: multiple instances, one physical database Community Support kdb-and-q	3	7	August 1, 2014
Efficiently inserting streaming data to disk Community Support kdb-and-q	3	12	November 5, 2014
how to incorporate new data into an on-disk kdb database without Community Support kdb-and-q	6	19	July 6, 2017

Best Practice to persist real time & high vol Data in KDB

q)\l mydata q)t c0 c1

q)t:(c0:$();c1:int$()) q):mydata/t/ set .Q.en[:mydata] t :mydata/t/ q)\l mydata q):t/ upsert (:sym?a;1) :t/ q):t/ upsert (:sym?b;2) `:t/ q)t c0 c1

q)\l mydata q)t c0 c1

Related topics

q)\l mydata
q)t
c0 c1

q)t:(c0:`$();c1:`int$())
q)`:mydata/t/ set .Q.en[`:mydata] t
`:mydata/t/ q)\l mydata q)`:t/ upsert (`:sym?`a;1)
`:t/ q)`:t/ upsert (`:sym?`b;2)
`:t/
q)t
c0 c1

q)\l mydata
q)t
c0 c1