RE: [personal kdb+] kdb+tick with schemaless events

david_demner · April 29, 2015, 2:54pm

User-Agent: Workspace Webmail 5.14.0Message-Id: <20150429065429.85f80dae80d1d2f2e266ec6278e6cbe8.368683ce6a.wbe@email07.europe.secureserver.net>From: “David Demner (AquaQ)” <david.demner>To: personal-kdbplus@googlegroups.comSubject: RE: [personal kdb+] kdb+tick with schemaless eventsDate: Wed, 29 Apr 2015 06:54:29 -0700Mime-Version: 1.0

Forgot to include an example of what the performance impact might be, even when selecting a smaller part of the table:

q)n:1000000;t:`sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 2)))

q)`:t/ set 0#t;

q)`:t/ upsert t;

q)@[`:t/;`sym;`p#]

q)\ts:100 select time from t where sym=9 /cost of selecting one simple column for one sym

31 66144

q)\ts:100 select time,sym from t where sym=9 /minimal incremental cost to adding a second column

31 131680

q)\ts:100 select time,sym,data from t where sym=9 /massive incremental cost when adding the (not much larger on disk) complex column

55684 168389072

joshmyzie2 · April 29, 2015, 5:21pm

Ah, you’re right. ?It reads the whole table into memory when selecting on the complex column.

I suppose this is a reason to just serialize the complex column manually with -8!'. ?I modified your benchmark and can now get efficient random access to all columns. ?I wonder why kdb doesn’t do this automatically?

q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 2))) q):t/ set 0#t;
q):t/ upsert t; q)@[:t/;sym;p#];
q)\l .
q)\ts:100 select time,sym from t where sym=9
9 131776
q)\ts:100 select time,sym,data from t where sym=9
53227 172583568

q)n:1000000;t:update -8!'data from sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 2))) q):t/ set t;
q)\l .
q)\ts:100 select time,sym from t where sym=9
10 131776
q)\ts:100 select time,sym,-9!'data from t where sym=9
201 1682656

29.04.2015, 16:57, “David Demner (AquaQ)” <david.demner>:
> ?Forgot to include an example of what the performance impact might be, even when selecting a smaller part of the table:
>
> ?q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 2))) > ?q):t/ set 0#t;
> ?q):t/ upsert t; > ?q)@[:t/;sym;p#]
> ?q)\ts:100 select time from t where sym=9 /cost of selecting one simple column for one sym
> ?31 66144
> ?q)\ts:100 select time,sym from t where sym=9 /minimal incremental cost to adding a second column
> ?31 131680
> ?q)\ts:100 select time,sym,data from t where sym=9 /massive incremental cost when adding the (not much larger on disk) complex column
> ?55684 168389072
>> ?</david.demner>

david_demner · April 29, 2015, 11:04pm

charset=“utf-8”

X-Mailer: Microsoft Outlook 15.0
Thread-Index: AQFw4f2vUdJMynxoddcxdOeWTMBQEgGoV2+SnhZ7RfA=
Content-Language: en-us

Hm, yeah… The difference is the serialized column is a nested binary =
list (that kdb+ can access randomly) vs a complex object that kdb+ has =
to do a full column scan on.

q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 = 2))); :t/ set 0#t;:t/ upsert t; @[:t/;sym;p#]
t -> complex object
04/29/2015 02:13 PM 22 .d
04/29/2015 02:13 PM 73,000,008 data
04/29/2015 02:13 PM 8,007,800 sym
04/29/2015 02:13 PM 8,000,016 time
4 File(s) 89,007,846 bytes

q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:-8!'n#enlist(1 2!(1 = 2;1 2))); :t2/ set 0#t;:t2/ upsert t; @[:t2/;sym;p#]
t2 -> nested list (note the data# file)
04/29/2015 02:13 PM 22 .d
04/29/2015 02:13 PM 8,000,016 data
04/29/2015 02:13 PM 81,000,000 data#
04/29/2015 02:13 PM 8,007,800 sym
04/29/2015 02:13 PM 8,000,016 time
5 File(s) 105,007,854 bytes

> I wonder why kdb doesn’t do this automatically?
The serialized version consumes 16 more bytes per table row maybe that’s =
why kx wouldn’t want to do it automatically? Other than that, though, it =
looks like the gains are pretty substantial.

-----Original Message-----
From: personal-kdbplus@googlegroups.com =
[mailto:personal-kdbplus@googlegroups.com] On Behalf Of Josh Myzie
Sent: Wednesday, April 29, 2015 9:21 AM
To: personal-kdbplus@googlegroups.com
Subject: Re: [personal kdb+] kdb+tick with schemaless events

Ah, you’re right. It reads the whole table into memory when selecting =
on the complex column.

I suppose this is a reason to just serialize the complex column manually =
with -8!'. I modified your benchmark and can now get efficient random =
access to all columns. I wonder why kdb doesn’t do this automatically?

q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 = 2))) q):t/ set 0#t; q):t/ upsert t; q)@[:t/;sym;p#]; q)\l .
q)\ts:100 select time,sym from t where sym=3D9
9 131776
q)\ts:100 select time,sym,data from t where sym=3D9
53227 172583568

q)n:1000000;t:update -8!'data from sym xasc([]time:n?0D; sym:n?200; = data:n#enlist(1 2!(1 2;1 2))) q):t/ set t; q)\l .
q)\ts:100 select time,sym from t where sym=3D9
10 131776
q)\ts:100 select time,sym,-9!'data from t where sym=3D9
201 1682656

29.04.2015, 16:57, “David Demner (AquaQ)” <david.demner>:
> Forgot to include an example of what the performance impact might be, =
even when selecting a smaller part of the table:
>
> q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1=20 > 2;1 2))) > q):t/ set 0#t;
> q):t/ upsert t; > q)@[:t/;sym;p#]
> q)\ts:100 select time from t where sym=3D9 /cost of selecting one=20
> simple column for one sym
> 31 66144
> q)\ts:100 select time,sym from t where sym=3D9 /minimal incremental=20
> cost to adding a second column
> 31 131680
> q)\ts:100 select time,sym,data from t where sym=3D9 /massive=20
> incremental cost when adding the (not much larger on disk) complex=20
> column
> 55684 168389072
>> </david.demner>

Topic		Replies	Views
RE: [personal kdb+] kdb+tick with schemaless events Community Support imported , kdb-and-q	1	7	April 29, 2015
kdb+tick with schemaless events Community Support kdb-and-q	2	6	April 29, 2015
Nested table in KDB Community Support kdb-and-q	8	28	January 27, 2011
Optimal Database Structure Community Support kdb-and-q	8	14	January 7, 2015
Complex Column Data Community Support kdb-and-q	8	27	September 11, 2008

RE: [personal kdb+] kdb+tick with schemaless events

Related topics