To: personal-kdbplus@googlegroups.com
X-Mailer: Apple Mail (2.1251.1)
it really depends on the exact queries, but
3GB is so small that you should keep it all in memory for fast queries
could keep it as one file or might splay it for persistence
but certainly would not partition it as seeking destroys your =
performance
keep sorted by time (date+hour) stored as an int
put a `g# on node
and all queries should be quite performant
let us know how it goes
Cheers,
Attila
On 28 Dec 2011, at 06:12, Peter wrote:
> Thanks for the response.
>
> A sample strategy would be to go back historically over an indicative
> data source, like say temperature, and look for correlated node-hour
> pairs.
>
> Strategies often involve per-node-hour historical queries, e.g. at the
> hour ending 7 am at this node and hour pair for the last 600 days,
> please tell me..
>
> All that to say, sometimes one wishes to slice days, sometimes hours,
> sometimes nodes. Sometimes one wishes to compare the slice to other
> hours, days or nodes.
>
> I’ll post a followup with some results, although they may only be of
> interest to me. :)
>
> Peter
>
> On Dec 26, 10:29 pm, Aaron Davies <aaron.dav…> wrote:
>> i’d suggest partitioned by date, with node, time, and the six data =
points
>> being the columns (total of nine counting date).
>>
>> while there’s technically no limit (AFAIR) on the number of columns =
in a
>> table, very wide tables are awkward to work with and should be used
>> sparingly.
>>
>> what sort of queries are you likely to run? if you’ll be comparing =
many
>> nodes against each other, a wide schema is probably better; if you’ll =
be
>> comparing nodes against their own histories, a narrower one.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Monday, December 26, 2011, Peter <vesse…> wrote:
>>> Hi all,
>>
>>> I’m learning Q and KDB over my holiday break for a bit of fun. As a
>>> test case, I am loading up some hourly tick data from energy nodes.
>>> I’m not sure of the most kdb-ish way to store the data,so I thought
>>> I’d ask for advice.
>>
>>> Here are the specs on the data: there are 6 datapoints per node per
>>> hour. The datapoints are a ‘bid’ and a ‘final’ / ‘realtime’. There =
are
>>> roughly 5000 nodes. All of these could be stored in 16 bit ints =
pretty
>>> easily
>>
>>> So, five years of data = 50002453656*16 = 3GB or so of raw =
data.
>>
>>> What I’m trying to figure out is how to store this. One possibility
>>> would be one column per node (so 5000 cols), one row per hours, and =
a
>>> non-simple list of six data-items in each slot.
>>
>>> My understanding is that this doesn’t splay well? If that’s the =
case,
>>> I’m looking at either 30000 cols, or alternately, 5000 cols and 6 =
rows
>>> per timestamp.
>>
>>> Are any of these three choices more suited to how Q and Kdb work? =
The
>>> data needs a good amount of munging to come over into this format, =
so
>>> I’d like to have a good gameplan first.
>>
>>> Thanks for the help.
>>
>>> –
>>> You received this message because you are subscribed to the Google =
Groups
>>
>> “Kdb+ Personal Developers” group.> To post to this group, send email =
to personal-kdbplus@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>
>> personal-kdbplus+unsubscribe@googlegroups.com.> For more options, =
visit this group at
>>
>> http://groups.google.com/group/personal-kdbplus?hl=en.
>>
>>
>>
>> –
>> Aaron Davies
>> aaron.dav...@gmail.com
>
> –
> You received this message because you are subscribed to the Google =
Groups “Kdb+ Personal Developers” group.
> To post to this group, send email to =
personal-kdbplus@googlegroups.com.
> To unsubscribe from this group, send email to =
personal-kdbplus+unsubscribe@googlegroups.com.
> For more options, visit this group at =
http://groups.google.com/group/personal-kdbplus?hl=en.
>
</vesse…></aaron.dav…>