kdb+tick with schemaless events

Hello,I am writing an event-driven application and I want to send all eventsto kdb for persistence and running real-time ad-hoc queries. Ratherthan hard-code the schema for all my events, which will change overtime, I am thinking of sending a single table to my ticker plant:( time:timespan$(); sym:g#symbol(); eventData:())where "sym" will be the event name and eventData can be any dict.Example table with two event types:time sym eventData---------------------------------------------------------------------------0D11:14:57.333000000 e1 xxyy!1 20D11:14:57.333000000 e2 aabbcc!(5;0.3927524 0.5170911 0.5159796;abc)0D11:14:57.333000000 e1 xx`yy!5 2My questions are:1. Is this strategy with kdb a terrible idea?2. How should I serialize the eventData for EOD persistence? Just"-8!"? Any reason to use JSON instead?3. Should I instead serialize the eventData BEFORE sending it to myticker plant, so that I don’t need to modify tick.q or r.q?Thanks,Josh

charset=“utf-8”

X-Mailer: Microsoft Outlook 15.0
Thread-Index: AQKYumRTOhClRWMX7zAAv0yC95ws7ZvSKeQA
Content-Language: en-us

  1. I think your performance will be pretty bad especially if you have =
    lots of events. This is especially true if you have longer hdb queries =
    because the eventData column can’t be randomly accessed.

If each event type has the same schema, it may be better to split each =
one into a separate table (in your upd event). If your schema can change =
over time, have a look at dbmaint.q for HDB schema maintenance (or =
perhaps you won’t need it since kdb+ reads the schema from the latest =
partition in your hdb)

That being said, it’s certainly possible if you’re willing to pay the =
price.

  1. I think JSON would just bloat it further for not much (no?) benefit. =
    I don’t think you need to serialize (just set the empty table then =
    upsert the results possibly with .z.zd or manual compression) and in =
    fact maybe serialization would slow it down further

  2. I don’t know much about tick.q or r.q. But it’s likely pointless to =
    serialize before (kdb is very clever about serializing where necessary)

-----Original Message-----
From: personal-kdbplus@googlegroups.com =
[mailto:personal-kdbplus@googlegroups.com] On Behalf Of =
joshmyzie2@yandex.com
Sent: Tuesday, April 28, 2015 8:29 AM
To: personal-kdbplus@googlegroups.com
Subject: [personal kdb+] kdb+tick with schemaless events

Hello,

I am writing an event-driven application and I want to send all events =
to kdb for persistence and running real-time ad-hoc queries. Rather =
than hard-code the schema for all my events, which will change over =
time, I am thinking of sending a single table to my ticker plant:

( time:timespan$(); sym:g#`symbol(); eventData:())

where “sym” will be the event name and eventData can be any dict.
Example table with two event types:

time sym eventData
-------------------------------------------------------------------------=

0D11:14:57.333000000 e1 xxyy!1 2
0D11:14:57.333000000 e2 aabbcc!(5;0.3927524 0.5170911 = 0.5159796;abc)
0D11:14:57.333000000 e1 xxyy!5 2

My questions are:

  1. Is this strategy with kdb a terrible idea?

  2. How should I serialize the eventData for EOD persistence? Just =
    “-8!”? Any reason to use JSON instead?

  3. Should I instead serialize the eventData BEFORE sending it to my =
    ticker plant, so that I don’t need to modify tick.q or r.q?

Thanks,
Josh


You received this message because you are subscribed to the Google =
Groups “Kdb+ Personal Developers” group.
To unsubscribe from this group and stop receiving emails from it, send =
an email to personal-kdbplus+unsubscribe@googlegroups.com.
To post to this group, send email to personal-kdbplus@googlegroups.com.
Visit this group at http://groups.google.com/group/personal-kdbplus.
For more options, visit https://groups.google.com/d/optout.

Thanks for the reply, David.Regarding performance, I realize I will take a hit, but my thinking wasthat I will either only query a small time window / specific event type,or I would split out a specific event type to a standard schema table.Maybe I’m misunderstanding you, but how would I save my events (nesteddicts) to a hdb without serializing? For example, the following tablewon’t save unless I serialize the data column:q)t:(time:3?0D; sym:til 3; data:3#enlist(1 2!(1 2;1 2)))q)ttime sym data--------------------------------------0D05:44:29.828280061 0 1 2!(1 2;1 2)0D03:37:10.269978940 1 1 2!(1 2;1 2)0D03:45:41.618905216 2 1 2!(1 2;1 2)q):/tmp/t/ set tk){$[@x;.[x;();:;y];-19!((,y),x)]}'typeq.q))\q):/tmp/t/ set update -8!'data from t:/tmp/t/JoshOn 28 April 2015 20:24 UTC, David Demner (AquaQ) <david.demner> wrote:&gt; 1. I think your performance will be pretty bad especially if you have lots of events. This is especially true if you have longer hdb queries because the eventData column can't be randomly accessed.&gt;&gt; If each event type has the same schema, it may be better to split each one into a separate table (in your upd event). If your schema can change over time, have a look at dbmaint.q for HDB schema maintenance (or perhaps you won't need it since kdb+ reads the schema from the latest partition in your hdb)&gt;&gt; That being said, it's certainly possible if you're willing to pay the price.&gt;&gt; 2. I think JSON would just bloat it further for not much (no?) benefit. I don't think you need to serialize (just set the empty table then upsert the results possibly with .z.zd or manual compression) and in fact maybe serialization would slow it down further&gt;&gt; 3. I don't know much about tick.q or r.q. But it's likely pointless to serialize before (kdb is very clever about serializing where necessary)&gt;&gt; -----Original Message-----&gt; From: personal-kdbplus@googlegroups.com [mailto:personal-kdbplus@googlegroups.com] On Behalf Of joshmyzie2@yandex.com&gt; Sent: Tuesday, April 28, 2015 8:29 AM&gt; To: personal-kdbplus@googlegroups.com&gt; Subject: [personal kdb+] kdb+tick with schemaless events&gt;&gt;&gt; Hello,&gt;&gt; I am writing an event-driven application and I want to send all events to kdb for persistence and running real-time ad-hoc queries. Rather than hard-code the schema for all my events, which will change over time, I am thinking of sending a single table to my ticker plant:&gt;&gt; ([] time:timespan$(); sym:g#symbol(); eventData:())>> where “sym” will be the event name and eventData can be any dict.> Example table with two event types:>> time sym eventData> ---------------------------------------------------------------------------> 0D11:14:57.333000000 e1 xxyy!1 2> 0D11:14:57.333000000 e2 aabbcc!(5;0.3927524 0.5170911 0.5159796;abc)> 0D11:14:57.333000000 e1 xxyy!5 2>>> My questions are:>> 1. Is this strategy with kdb a terrible idea?>> 2. How should I serialize the eventData for EOD persistence? Just “-8!”? Any reason to use JSON instead?>> 3. Should I instead serialize the eventData BEFORE sending it to my ticker plant, so that I don’t need to modify tick.q or r.q?>> Thanks,> Josh>> –>

Submitted via Google Groups</david.demner>