Hi Markus
It’s a tricky one. Technically you are at risk during the upsert as well as kdb+ will append to each column sequentially and non-atomically so you may get issues (different length columns). You shouldn?t modify data that is being accessed (writing new partitions is ok because the database doesn?t know about the new partition until it has been reloaded)
In general it’s a bad idea to have to rely on resorting of large data sets to reapply attributes - usually because as the data set grows the ?offline? time caused by the sort will increase. It’s better practice to try to structure it so you don?t have to - when you are upserting the new data, what makes it new? It?s quite common for that type of structure that has to be updated to have the p# on a temporal field such as date (then you don?t have to re-sort, only set the attribute).
In terms of your sort, you can do it directly on disk then reset the attribute which might be quicker e.g.
@[;sym;
p#] sym xasc
:table
(you might want to test that ? I haven?t!)
Also, the csvguess script contains an optimised on disk sorter I think. However, neither of these will help your inconsistency problems. To solve that, you probably need to change your setup. Some suggestions:
-
only do updates when no one is accessing the data (market closed, do it at the weekend etc.)
-
allow intraday updates (and take the risk of an upsert giving you an inconsistency) and only re-do the sort and partition at a downtime and be prepared to have variable access times
-
modify the processes to allow them to either be in-accessible to clients or to not attempt to access the data that is being modified while it is being amended
Thanks
Jonny
AquaQ Analytics
On 30 Jul 2014, at 15:39, Markus Sieber <sieber@boerse-go.de> wrote:
Hi,
i have a database with many splayed tables that contain mostly static data.
Multiple kdb instances load this database and work with the data.
the tables are `p# by a symbol and sorted by date for each group
But once in a while i have to insert data into some splayed tables.
Therefore i use another process that upserts data into the tables thereby automatically removing die `p atttribute. After finishing with the inserts the process
sorts the entries and then applies `p#.
During upsert and especially during sorting the data the database is in an inconsistent
state. What is the best approach to provide stable data?
As far as i can see the upsert is not a problem, the removed attribute just slows the other instances.
But sorting, `p# and saving the table with
.[table/; (); :;]@[;
sym;p#] xasc[
sym]select from `:table
seems more severe to me as kdb writes the data column by column. so the not written columns don’t match new already written ones.
What is the best approach to have a consistent database?
Maybe writing the `p#-ed table into a temp directory first and then moving it over the real one?
thank you,
Markus
–
Submitted via Google Groups