If I have just OHLCV data going back 30 years, is it still better to partition this by date? It seems like it would be a waste of file I/O with so many dates, and guessing it would be slower to query by symbol over the entire history. Would it be better if I partitioned this by symbol?
a db can not be partitioned on sym (unless you convert the values to integers). it must be an integer value such as date, month, year, int.
sounds like you might like to partition by month, but remember that each day you load new values, you’ll need to re-load the whole month. also note that your ‘where’ clause should always begin by filtering on month.
q)n:100000
q)t:date xasc ([]date:n?.z.D;sym:n?3;o:n?1f;h:n?1f;l:n?1f;c:n?1f;v:n?100)
q)g:t group “m”$tdate q){[p;t].Q.dpft[:/tmp/db;p;sym;ohlcv set t]}'[key g;value g];
q)\l /tmp/db
q)meta ohlcv
Hi Roni,
Assuming that you’re querying the entire data then partitioned by int (where ints map to syms) makes sense as the data is already sorted by sym and the number of files io operations is minimised. Can you elaborate on the number of syms present and the daily volume of records coming in?
Oh ok, thought I could partition by sym but partitioning by month will work. Thanks for this. I should have roughly 8k syms and one record per sym daily.
Whatever method of partitioning you’re going to settle on really depends on what sort of queries you’re likely to be running.
You obviously don’t want to be loading in data you aren’t actually interested in with regards to memory and speed of queries.
If your queries are focused within a particular window of time and you’re considering all syms then doing some sort of temporal partitioning makes sense.
If your main focus is for a particular sym then I’d partition by “symbol” (int) . If this is the case then you could do the following to partition by “sym” (with some modification for your own data set)
Create a sym-int map for your syms:
map:s!til count s:distinct tablesym`
Add a new column to your table then that contains the int value for the sym in that record:
update sym_int: map sym from table
Then save down the data to the separate int partitions:
{.Q.dpft[:hdb;x;sym_int;table]} each til count map`
As your data coming in is simply being appended to your daily partition you might want to create another database that’s partitioned by “sym” (int) and transfer the data over, depending on what sort of resources you have.