Hi, long-time listener, first-time caller.
I’m setting up a trade and quote database, but I’m probably doing some bone-headed stuff.
I adapted the logic from this whitepaper to put equity trades and quotes for a single day into some tables:
- Segmented by first letter of the ticker
- Partitioned by date
- Splayed
- Sorted by ticker then time, `p# on ticker
Not having any trouble with the trades table, works grea (though I only have one day of data so fart:
q)select vwap:qty wavg px by ticker from trades where ticker in
AAPL
XOMBWLD
BUDBEAV
TAP,tc in (0,108),date = 2014.12.24
ticker| vwap
------| --------
AAPL | 112.2355
BEAV | 58.70154
BUD | 115.4508
BWLD | 182.1343
TAP | 76.29789
XOM | 93.61004
But I can’t do anything with the quotes table without experiencing memory problems:
q)select high:max(ap) from quotes where ticker in
AAPL
XOMBEAV
BWLDTAP
BUD,date = 2014.12.24
k){$[~#D;p2[x;:.]':y;(,/p2[x]'/':P[i](;)'y)@<,/y@:i:&0<#:'y:D{x@&x in y}\:y]}<br></font><font face='"courier' new monospace>'/kdb/segments/S/2014.12.24/quotes/ap: <span style='"background-color:' rgb>Cannot allocate memory</span><br></font><font face='"courier' new monospace>@<br></font><font face='"courier' new monospace>k){0!(?).@[x;0;p1[;y;z]]}[(
quotes;((in;ticker;,
AAPLXOM
BEAVBWLD
TAPBUD);(=;
date;2014.12.24));()!();(,0)!,(max;
ap))]‘/’:
((:/kdb/segments/A;,2014.12.24);(
:/kdb/segments/B;,2014.12.24);(:/kdb/segments/C;,2014.12.24);(
:/kdb/..
q.Q))</font>
Or:
q)select high:max(ap),low:min(bp) by ticker from quotes where bp > 0
wsfull
I start up with 26 slave threads, one per segment, but I think that’s the wrong approach; that would start the slaves in-process, and the memory limit is per-process. I think.
$ rlwrap q/l32/q -c 25 200 -s 26
The documentationsays I can start the slaves as slave processes instead by supplying a negative 26 instead, but when I do that I get this:
q)select high:max(ap) from quotes where ticker in
AAPL
XOMBEAV
BWLDTAP
BUD,date = 2014.12.24
k){$[~#D;p2[x;:.]':y;(,/p2[x]'/':P[i](;)'y)@<,/y@:i:&0<#:'y:D{x@&x in y}\:y]}<br></font><font face='"courier' new monospace><span style='"background-color:' rgb>'.z.pd</span><br></font><font face='"courier' new monospace>@<br></font><font face='"courier' new monospace>k){0!(?).@[x;0;p1[;y;z]]}[(
quotes;((in;ticker;,
AAPLXOM
BEAVBWLD
TAPBUD);(=;
date;2014.12.24));()!();(,0)!,(max;
ap))]‘/’:
That .z.pd error brought me to this page which tells me that I have to set up the slaves myself, so I followed the instructions outlined in the “more comprehensive setup” snippet:
q).z.pd
{n:abs system"s";$[n=count handles;handles;[hclose each handles;:handles::u#hopen each 20000+til n]]}<br>q).z.pc<br>{handles::
u#handles except x;}
q)handlesu#
int$()
But then it tells me this:
q)select high:max(ap) from quotes where ticker in
AAPL
XOMBEAV
BWLDTAP
BUD,date = 2014.12.24
k){x’y}
‘hop: Connection refused
@
<:’
20000 20001 20002 20003 20004 20005 20006 20007 20008 20009 20010 20011 20012 20013 20014 20015 20016 20017 20018 20019 20020 20021 20022 20023 20024 20025
So I’m guessing that doesn’t actually start the sub-processes, which is confirmed by the little nugget at the bottom of that reference page: “Note that the worker processes are not started automatically by kdb+.” Also, I don’t see the processes in ps.
I can start slave processes on the correct ports by adapting the code from here:
{value raze “\/path/to/q/l32/q /path/to/slave.q -p “, string x,” &”} each 20000 + til abs system"s"
Where slave.q just loads the database that contains the sym and par.txt for the trades and quotes tables:
\l /kdb/db
Had to add the & so every call doesn’t dump me into a sub-shell. (Why do I need that “raze” ?)
Now my quote queries work:
q)select high:max(ap),low:min(bp) by ticker,date from quotes where bp > 0,ticker in
AAPL
XOMBEAV
BWLDTAP
BUD,extime within (09:30:00;16:00:00)
ticker date | high low
-----------------| -------------
AAPL 2014.12.24| 117.88 91.06
BEAV 2014.12.24| 91 54.55
BUD 2014.12.24| 147.99 86.11
BWLD 2014.12.24| 233.7 135.86
TAP 2014.12.24| 89.66 59.63
XOM 2014.12.24| 109.27 67.31
It’s quite slow though. In top I can see each process doing stuff, when really only four segments have the data I want.
Is each slave searching through every segment? How do I tell a slave which segment it should search? Should I bother segmenting and make a bunch of independent databases instead?
I’m running on RHEL 6.2 using:
KDB+ 3.2 2015.01.05 Copyright (C) 1993-2015 Kx Systems
l32/ 32()core 258175MB NONEXPIRE