Realtime data persistence

Hi all,

I am new to KDB, I have a question on realtime data persistence.

I had a look at this thread. Looks like splayed table is a good way to persist data on disk.

https://groups.google.com/forum/#!topic/personal-kdbplus/8vI3iwD3X4c

I am working on a simulation exchange, which I would like to persist 3 things

  1. Real-time order book

  2. Trade execution

  3. incoming orders

Basically I have 2 concerns

  1. Splayed table is to store data on disk, if I have to save it to splayed table every time there is a trade or incoming orders, will saving both to in memory table and splayed table introduce huge overhead?

  2. As a trade may cause insert, update and delete entries in the orderbook in one go. How can I capture all changes to the splayed table in one go, instead of apply the change after every single insert/update/delete operation?

Thanks very much.

Cheers,

Juno

I’d recommend looking into kdb+tick, which forms the basis of kdb+ real-time data capture infrastructure.

More information can be found from:

On Wednesday, March 7, 2018 at 11:48:09 AM UTC+8, Afonso Juno Chan wrote:

Hi all,

I am new to KDB, I have a question on realtime data persistence.

I had a look at this thread. Looks like splayed table is a good way to persist data on disk.

https://groups.google.com/forum/#!topic/personal-kdbplus/8vI3iwD3X4c

I am working on a simulation exchange, which I would like to persist 3 things

  1. Real-time order book

  2. Trade execution

  3. incoming orders

Basically I have 2 concerns

  1. Splayed table is to store data on disk, if I have to save it to splayed table every time there is a trade or incoming orders, will saving both to in memory table and splayed table introduce huge overhead?

  2. As a trade may cause insert, update and delete entries in the orderbook in one go. How can I capture all changes to the splayed table in one go, instead of apply the change after every single insert/update/delete operation?

Thanks very much.

Cheers,

Juno

Hi Afonso,

About your concerns:

It isn’t a good idea to splay a table everytime there is a trade or incoming order as frequent disk IO is much slower than writing in memory, this results in large overhead. Due to KDB’s single threaded nature, if you run anything that takes a bit of time, it will block the process and this could lead to a back log of messages to the process. It is much better if you splay a table at set intervals, such as end of day.

I would also recommend looking at partition tables if you haven’t already, as they reduce what is loaded into memory if you wish to group these records in a particular way, such as date.

In addition to kdb+tick, there exist some open source KDB production frameworks on github. These frameworks expand upon tick functionality and add considerable additional functionality for managing real time data.

If you were concerned about losing tables in memory incase a process were to die, kdb+tick keeps a tickerplant log file, therefore, everything is still persisted to disk and can be recovered. 

I hope this helps.

Julia Partridge

Since kdb has the following concepts:

  1. xasc on disk

  2. `:sym? on disk where sym file is auto updated.

Is there a way to do the same thing with a partitioned / segmented table ?

Hi Science Student,

Saving down to a partition with a sorted sym field can be completed using the .Q.dpft(save table) utility. This saves a simple table, splayed to a specific partition of a database, sorted(p#) on a specified field. 

    - Table cannot be keyed.

Syntax;

.Q.dpft[directory;parition;`p#field;tablename]

.Q.dpft will also rearrange the columns of the table so that the column specified on utility use is the second in the table (the first of which will be the virtual column determined by the partition, in this case, by date).

q)trade:(sym:10?ab`c;time:.z.T+10*til 10;price:50f+10?50f;size:100*1+10?10)

q).Q.dpft[`:db;2007.07.23;`sym;`trade]

`trade

Successful execution will return the table name.

For more information on useful .Q utilies, check out;

https://code.kx.com/q/ref/dotq/

These utilities have been expanded and enriched in open source kdb+ frameworks which can be located on GitHub.

Useful points;

.Q.en(which enumerates all necessary columns in a table against the sym file, rather than having to do this manually with ‘?’ for each column) allows for easier handling of enumeration.

.Q.ens (which will be coming in v3.6) which will utilise the same functionality as .Q.en, but allowing he use custom sym file name.

.Q.dpft will actually use the .Q.en utility to complete the enumeration so this is then abstracted away from the user.

Hopefully this helps you.

Kind Regards,

Jordan Shaw

I realize the .Q.dpft but inderneath it rewrites the entire the column to splayed file from in-memory table to on-disk table.

I am trying to see if there is something which will append the file rather than overwrite it. Any internal functions which do it ?

Hi,

You can append to a splayed table using upsert as follows:

q)tab:(a:1+til 4;b:4?.Q.a)

q)`:splaytab/ set tab

`:splaytab/

q)`:splaytab/ upsert (19;“x”)

`:splaytab/

q)tab2: get `:splaytab/

q)tab2

a b

----

1 o

2 j

3 s

4 r

19 x

This appends a new record to the table. You can see this by loading in the table again and checking for the new row.

In the case that your table has a column of type symbol, you need to enumerate when initially splaying, and when before upserting the new record:

q)tab:(a:1+ til 4;b:10+til 4;c:-4?`1)

q):splaytab/ set .Q.en[.] tab

`:splaytab/

q):splaytab/ upsert .Q.en[.] enlist abc!(11;20;z)

`:splaytab/

q)tab1:get `:splaytab/

q)tab1

a b c

-------

1 10 h

2 11 m

3 12 o

4 13 p

11 20 z

kx.com has a very helpful segment on this, lifted from Q for Mortals:

http://code.kx.com/q4m3/14\_Introduction\_to\_Kdb+/#1426-appending-to-a-splayed-table

Hope this helps,

James

  1. Are you saying that upsert opens the column file in append mode and just puts the element at the end ? (Splayed tables are stored as 1 file per column)

  2. Was wondering, how do you append to a list stored in a file ? I thought it always overwrote the file from scratch. Your thoughts ?

Thanks.

  1. read about upsert: “update in place”

2. 

  a) use a table

  b) use a file handle