Preferred KDB format for energy tick data?

Peter1 · December 27, 2011, 4:11am

Hi all,

I’m learning Q and KDB over my holiday break for a bit of fun. As a
test case, I am loading up some hourly tick data from energy nodes.
I’m not sure of the most kdb-ish way to store the data,so I thought
I’d ask for advice.

Here are the specs on the data: there are 6 datapoints per node per
hour. The datapoints are a ‘bid’ and a ‘final’ / ‘realtime’. There are
roughly 5000 nodes. All of these could be stored in 16 bit ints pretty
easily

So, five years of data = 5000*24*5*365*6*16 = 3GB or so of raw data.

What I’m trying to figure out is how to store this. One possibility
would be one column per node (so 5000 cols), one row per hours, and a
non-simple list of six data-items in each slot.

My understanding is that this doesn’t splay well? If that’s the case,
I’m looking at either 30000 cols, or alternately, 5000 cols and 6 rows
per timestamp.

Are any of these three choices more suited to how Q and Kdb work? The
data needs a good amount of munging to come over into this format, so
I’d like to have a good gameplan first.

Thanks for the help.

Aaron_Davies · December 27, 2011, 6:29am

i’d suggest partitioned by date, with node, time, and the six data points being the columns (total of nine counting date).

while there’s technically no limit (AFAIR) on the number of columns in a table, very wide tables are awkward to work with and should be used sparingly.

what sort of queries are you likely to run? if you’ll be comparing many nodes against each other, a wide schema is probably better; if you’ll be comparing nodes against their own histories, a narrower one.

Peter1 · December 28, 2011, 6:12am

Thanks for the response.A sample strategy would be to go back historically over an indicativedata source, like say temperature, and look for correlated node-hourpairs.Strategies often involve per-node-hour historical queries, e.g. at thehour ending 7 am at this node and hour pair for the last 600 days,please tell me..All that to say, sometimes one wishes to slice days, sometimes hours,sometimes nodes. Sometimes one wishes to compare the slice to otherhours, days or nodes.I’ll post a followup with some results, although they may only be ofinterest to me. :)PeterOn Dec 26, 10:29?pm, Aaron Davies <aaron.dav…> wrote:> i’d suggest partitioned by date, with node, time, and the six data points> being the columns (total of nine counting date).>> while there’s technically no limit (AFAIR) on the number of columns in a> table, very wide tables are awkward to work with and should be used> sparingly.>> what sort of queries are you likely to run? if you’ll be comparing many> nodes against each other, a wide schema is probably better; if you’ll be> comparing nodes against their own histories, a narrower one.>>>>>>>>>> On Monday, December 26, 2011, Peter <vesse…> wrote:> > Hi all,>> > I’m learning Q and KDB over my holiday break for a bit of fun. As a> > test case, I am loading up some hourly tick data from energy nodes.> > I’m not sure of the most kdb-ish way to store the data,so I thought> > I’d ask for advice.>> > Here are the specs on the data: there are 6 datapoints per node per> > hour. The datapoints are a ‘bid’ and a ‘final’ / ‘realtime’. There are> > roughly 5000 nodes. All of these could be stored in 16 bit ints pretty> > easily>> > So, five years of data = 50002453656*16 = 3GB or so of raw data.>> > What I’m trying to figure out is how to store this. One possibility> > would be one column per node (so 5000 cols), one row per hours, and a> > non-simple list of six data-items in each slot.>> > My understanding is that this doesn’t splay well? If that’s the case,> > I’m looking at either 30000 cols, or alternately, 5000 cols and 6 rows> > per timestamp.>> > Are any of these three choices more suited to how Q and Kdb work? The> > data needs a good amount of munging to come over into this format, so> > I’d like to have a good gameplan first.>> > Thanks for the help.>> > –> >

Submitted via Google Groups</vesse…></aaron.dav…>

Attila · January 1, 2012, 6:43pm

To: personal-kdbplus@googlegroups.com
X-Mailer: Apple Mail (2.1251.1)

it really depends on the exact queries, but
3GB is so small that you should keep it all in memory for fast queries
could keep it as one file or might splay it for persistence
but certainly would not partition it as seeking destroys your =
performance

keep sorted by time (date+hour) stored as an int
put a `g# on node
and all queries should be quite performant

let us know how it goes

Cheers,
Attila
On 28 Dec 2011, at 06:12, Peter wrote:

> Thanks for the response.
>
> A sample strategy would be to go back historically over an indicative
> data source, like say temperature, and look for correlated node-hour
> pairs.
>
> Strategies often involve per-node-hour historical queries, e.g. at the
> hour ending 7 am at this node and hour pair for the last 600 days,
> please tell me..
>
> All that to say, sometimes one wishes to slice days, sometimes hours,
> sometimes nodes. Sometimes one wishes to compare the slice to other
> hours, days or nodes.
>
> I’ll post a followup with some results, although they may only be of
> interest to me. :)
>
> Peter
>
> On Dec 26, 10:29 pm, Aaron Davies <aaron.dav…> wrote:
>> i’d suggest partitioned by date, with node, time, and the six data =
points
>> being the columns (total of nine counting date).
>>
>> while there’s technically no limit (AFAIR) on the number of columns =
in a
>> table, very wide tables are awkward to work with and should be used
>> sparingly.
>>
>> what sort of queries are you likely to run? if you’ll be comparing =
many
>> nodes against each other, a wide schema is probably better; if you’ll =
be
>> comparing nodes against their own histories, a narrower one.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Monday, December 26, 2011, Peter <vesse…> wrote:
>>> Hi all,
>>
>>> I’m learning Q and KDB over my holiday break for a bit of fun. As a
>>> test case, I am loading up some hourly tick data from energy nodes.
>>> I’m not sure of the most kdb-ish way to store the data,so I thought
>>> I’d ask for advice.
>>
>>> Here are the specs on the data: there are 6 datapoints per node per
>>> hour. The datapoints are a ‘bid’ and a ‘final’ / ‘realtime’. There =
are
>>> roughly 5000 nodes. All of these could be stored in 16 bit ints =
pretty
>>> easily
>>
>>> So, five years of data = 50002453656*16 = 3GB or so of raw =
data.
>>
>>> What I’m trying to figure out is how to store this. One possibility
>>> would be one column per node (so 5000 cols), one row per hours, and =
a
>>> non-simple list of six data-items in each slot.
>>
>>> My understanding is that this doesn’t splay well? If that’s the =
case,
>>> I’m looking at either 30000 cols, or alternately, 5000 cols and 6 =
rows
>>> per timestamp.
>>
>>> Are any of these three choices more suited to how Q and Kdb work? =
The
>>> data needs a good amount of munging to come over into this format, =
so
>>> I’d like to have a good gameplan first.
>>
>>> Thanks for the help.
>>
>>> –
>>> You received this message because you are subscribed to the Google =
Groups
>>
>> “Kdb+ Personal Developers” group.> To post to this group, send email =
to personal-kdbplus@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>
>> personal-kdbplus+unsubscribe@googlegroups.com.> For more options, =
visit this group at
>>
>> http://groups.google.com/group/personal-kdbplus?hl=en.
>>
>>
>>
>> –
>> Aaron Davies
>> aaron.dav...@gmail.com
>
> –
> You received this message because you are subscribed to the Google =
Groups “Kdb+ Personal Developers” group.
> To post to this group, send email to =
personal-kdbplus@googlegroups.com.
> To unsubscribe from this group, send email to =
personal-kdbplus+unsubscribe@googlegroups.com.
> For more options, visit this group at =
http://groups.google.com/group/personal-kdbplus?hl=en.
>

</vesse…></aaron.dav…>

Topic		Views
KDB+ use case Community Support kdb-and-q	1	April 4, 2010
Best practices setting up KDB+ server architecture? Community Support kdb-and-q	4	October 24, 2020
how to sync time series trade price tick data in Q/Kdb? Community Support kdb-and-q	1	July 26, 2010
historic data storage and retrieval Community Support kdb-and-q	7	October 13, 2013
Storing Price + Fundamental Data Community Support kdb-and-q	2	June 17, 2015

Preferred KDB format for energy tick data?

Related topics