best practice: multiple instances, one physical database

sieber · July 30, 2014, 3:39pm

Hi,

i have a database with many splayed tables that contain mostly static data.

Multiple kdb instances load this database and work with the data.

the tables are `p# by a symbol and sorted by date for each group

But once in a while i have to insert data into some splayed tables.

Therefore i use another process that upserts data into the tables thereby automatically removing die `p atttribute. After finishing with the inserts the process

sorts the entries and then applies `p#.

During upsert and especially during sorting the data the database is in an inconsistent

state. What is the best approach to provide stable data?

As far as i can see the upsert is not a problem, the removed attribute just slows the other instances.

But sorting, `p# and saving the table with

.[table/; (); :;]@[;sym;p#] xasc[sym]select from `:table

seems more severe to me as kdb writes the data column by column. so the not written columns don’t match new already written ones.

What is the best approach to have a consistent database?

Maybe writing the `p#-ed table into a temp directory first and then moving it over the real one?

thank you,

Markus

pressjonny0 · July 30, 2014, 7:32pm

Hi Markus

It’s a tricky one. Technically you are at risk during the upsert as well as kdb+ will append to each column sequentially and non-atomically so you may get issues (different length columns). You shouldn?t modify data that is being accessed (writing new partitions is ok because the database doesn?t know about the new partition until it has been reloaded)

In general it’s a bad idea to have to rely on resorting of large data sets to reapply attributes - usually because as the data set grows the ?offline? time caused by the sort will increase. It’s better practice to try to structure it so you don?t have to - when you are upserting the new data, what makes it new? It?s quite common for that type of structure that has to be updated to have the p# on a temporal field such as date (then you don?t have to re-sort, only set the attribute).

In terms of your sort, you can do it directly on disk then reset the attribute which might be quicker e.g.

@[;sym;p#] sym xasc :table

(you might want to test that ? I haven?t!)

Also, the csvguess script contains an optimised on disk sorter I think. However, neither of these will help your inconsistency problems. To solve that, you probably need to change your setup. Some suggestions:

only do updates when no one is accessing the data (market closed, do it at the weekend etc.)
allow intraday updates (and take the risk of an upsert giving you an inconsistency) and only re-do the sort and partition at a downtime and be prepared to have variable access times
modify the processes to allow them to either be in-accessible to clients or to not attempt to access the data that is being modified while it is being amended

Thanks

Jonny

AquaQ Analytics

On 30 Jul 2014, at 15:39, Markus Sieber <sieber@boerse-go.de> wrote:

Hi,

i have a database with many splayed tables that contain mostly static data.

Multiple kdb instances load this database and work with the data.

the tables are `p# by a symbol and sorted by date for each group

But once in a while i have to insert data into some splayed tables.

Therefore i use another process that upserts data into the tables thereby automatically removing die `p atttribute. After finishing with the inserts the process

sorts the entries and then applies `p#.

During upsert and especially during sorting the data the database is in an inconsistent

state. What is the best approach to provide stable data?

As far as i can see the upsert is not a problem, the removed attribute just slows the other instances.

But sorting, `p# and saving the table with

.[table/; (); :;]@[;sym;p#] xasc[sym]select from `:table

seems more severe to me as kdb writes the data column by column. so the not written columns don’t match new already written ones.

What is the best approach to have a consistent database?

Maybe writing the `p#-ed table into a temp directory first and then moving it over the real one?

thank you,

Markus

–
Submitted via Google Groups

sieber · August 1, 2014, 2:57pm

Hi Jonny,
thank you for tips. I tested my splayed database with partial upserts (some columns have new entries, some not). KDB+ does not throw length errors but instead just works :) this is great.

As of resorting I think I will just lock the clients during the write process (i am not sure how to do this yet..) Or I will write the sorted data to a new directory and then reload the new database in all clients (this is probably the safest way)

thank you! Markus

charlie1 · August 1, 2014, 4:47pm

> KDB+ does not throw length errors but instead just works :) this is great.

It might ‘work’ for your scenario, but in general this (reading whilst writing) is not reliable.

It will either truncate some results, or silently fill with null. And that is likely best case ;-)

Topic		Replies	Views
Best Practice to persist real time & high vol Data in KDB Community Support kdb-and-q	8	0	January 20, 2011
is the kdb documentation on upsert on a splayed table wrong or Community Support kdb-and-q	3	0	April 17, 2016
Realtime data persistence Community Support kdb-and-q	8	0	March 8, 2018
Efficiently inserting streaming data to disk Community Support kdb-and-q	3	0	November 5, 2014
Applying updates to a KDB table in thread safe manner in C Community Support kdb-and-q	7	1	June 6, 2014

best practice: multiple instances, one physical database

Related topics