User-Agent: Workspace Webmail 5.14.0Message-Id: <20150429065429.85f80dae80d1d2f2e266ec6278e6cbe8.368683ce6a.wbe@email07.europe.secureserver.net>From: “David Demner (AquaQ)” <david.demner>To: personal-kdbplus@googlegroups.comSubject: RE: [personal kdb+] kdb+tick with schemaless eventsDate: Wed, 29 Apr 2015 06:54:29 -0700Mime-Version: 1.0
Ah, you’re right. ?It reads the whole table into memory when selecting on the complex column.
I suppose this is a reason to just serialize the complex column manually with -8!'. ?I modified your benchmark and can now get efficient random access to all columns. ?I wonder why kdb doesn’t do this automatically?
q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 2))) q)
:t/ set 0#t;
q):t/ upsert t; q)@[
:t/;sym;
p#];
q)\l .
q)\ts:100 select time,sym from t where sym=9
9 131776
q)\ts:100 select time,sym,data from t where sym=9
53227 172583568
q)n:1000000;t:update -8!'data from sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 2))) q)
:t/ set t;
q)\l .
q)\ts:100 select time,sym from t where sym=9
10 131776
q)\ts:100 select time,sym,-9!'data from t where sym=9
201 1682656
29.04.2015, 16:57, “David Demner (AquaQ)” <david.demner>:
> ?Forgot to include an example of what the performance impact might be, even when selecting a smaller part of the table:
>
> ?q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 2)))<br>> ?q)
:t/ set 0#t;
> ?q):t/ upsert t;<br>> ?q)@[
:t/;sym;
p#]
> ?q)\ts:100 select time from t where sym=9 /cost of selecting one simple column for one sym
> ?31 66144
> ?q)\ts:100 select time,sym from t where sym=9 /minimal incremental cost to adding a second column
> ?31 131680
> ?q)\ts:100 select time,sym,data from t where sym=9 /massive incremental cost when adding the (not much larger on disk) complex column
> ?55684 168389072
>> ?</david.demner>
charset=“utf-8”
X-Mailer: Microsoft Outlook 15.0
Thread-Index: AQFw4f2vUdJMynxoddcxdOeWTMBQEgGoV2+SnhZ7RfA=
Content-Language: en-us
Hm, yeah… The difference is the serialized column is a nested binary =
list (that kdb+ can access randomly) vs a complex object that kdb+ has =
to do a full column scan on.
q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 = 2)));
:t/ set 0#t;:t/ upsert t; @[
:t/;sym;
p#]
t -> complex object
04/29/2015 02:13 PM 22 .d
04/29/2015 02:13 PM 73,000,008 data
04/29/2015 02:13 PM 8,007,800 sym
04/29/2015 02:13 PM 8,000,016 time
4 File(s) 89,007,846 bytes
q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:-8!'n#enlist(1 2!(1 = 2;1 2)));
:t2/ set 0#t;:t2/ upsert t; @[
:t2/;sym;
p#]
t2 -> nested list (note the data# file)
04/29/2015 02:13 PM 22 .d
04/29/2015 02:13 PM 8,000,016 data
04/29/2015 02:13 PM 81,000,000 data#
04/29/2015 02:13 PM 8,007,800 sym
04/29/2015 02:13 PM 8,000,016 time
5 File(s) 105,007,854 bytes
> I wonder why kdb doesn’t do this automatically?
The serialized version consumes 16 more bytes per table row maybe that’s =
why kx wouldn’t want to do it automatically? Other than that, though, it =
looks like the gains are pretty substantial.
-----Original Message-----
From: personal-kdbplus@googlegroups.com =
[mailto:personal-kdbplus@googlegroups.com] On Behalf Of Josh Myzie
Sent: Wednesday, April 29, 2015 9:21 AM
To: personal-kdbplus@googlegroups.com
Subject: Re: [personal kdb+] kdb+tick with schemaless events
Ah, you’re right. It reads the whole table into memory when selecting =
on the complex column.
I suppose this is a reason to just serialize the complex column manually =
with -8!'. I modified your benchmark and can now get efficient random =
access to all columns. I wonder why kdb doesn’t do this automatically?
q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1 2;1 = 2))) q)
:t/ set 0#t; q):t/ upsert t; q)@[
:t/;sym;
p#]; q)\l .
q)\ts:100 select time,sym from t where sym=3D9
9 131776
q)\ts:100 select time,sym,data from t where sym=3D9
53227 172583568
q)n:1000000;t:update -8!'data from sym xasc([]time:n?0D; sym:n?200; = data:n#enlist(1 2!(1 2;1 2))) q)
:t/ set t; q)\l .
q)\ts:100 select time,sym from t where sym=3D9
10 131776
q)\ts:100 select time,sym,-9!'data from t where sym=3D9
201 1682656
29.04.2015, 16:57, “David Demner (AquaQ)” <david.demner>:
> Forgot to include an example of what the performance impact might be, =
even when selecting a smaller part of the table:
>
> q)n:1000000;t:sym xasc([]time:n?0D; sym:n?200; data:n#enlist(1 2!(1=20<br>> 2;1 2)))<br>> q)
:t/ set 0#t;
> q):t/ upsert t;<br>> q)@[
:t/;sym;
p#]
> q)\ts:100 select time from t where sym=3D9 /cost of selecting one=20
> simple column for one sym
> 31 66144
> q)\ts:100 select time,sym from t where sym=3D9 /minimal incremental=20
> cost to adding a second column
> 31 131680
> q)\ts:100 select time,sym,data from t where sym=3D9 /massive=20
> incremental cost when adding the (not much larger on disk) complex=20
> column
> 55684 168389072
>> </david.demner>