using sub-processes to work around 32-bit limit

dave_a_mills · January 12, 2015, 6:25pm

Hi, long-time listener, first-time caller.

I’m setting up a trade and quote database, but I’m probably doing some bone-headed stuff.

I adapted the logic from this whitepaper to put equity trades and quotes for a single day into some tables:

Segmented by first letter of the ticker
Partitioned by date
Splayed
Sorted by ticker then time, `p# on ticker

Not having any trouble with the trades table, works grea (though I only have one day of data so fart:

q)select vwap:qty wavg px by ticker from trades where ticker in AAPLXOMBWLDBUDBEAVTAP,tc in (0,108),date = 2014.12.24
ticker| vwap
------| --------
AAPL | 112.2355
BEAV | 58.70154
BUD | 115.4508
BWLD | 182.1343
TAP | 76.29789
XOM | 93.61004

But I can’t do anything with the quotes table without experiencing memory problems:

q)select high:max(ap) from quotes where ticker in AAPLXOMBEAVBWLDTAPBUD,date = 2014.12.24
k){$[~#D;p2[x;:.]':y;(,/p2[x]'/':P[i](;)'y)@<,/y@:i:&0<#:'y:D{x@&x in y}\:y]} '/kdb/segments/S/2014.12.24/quotes/ap: Cannot allocate memory @ k){0!(?).@[x;0;p1[;y;z]]}[(quotes;((in;ticker;,AAPLXOMBEAVBWLDTAPBUD);(=;date;2014.12.24));()!();(,0)!,(max;ap))]‘/’:
((:/kdb/segments/A;,2014.12.24);(:/kdb/segments/B;,2014.12.24);(:/kdb/segments/C;,2014.12.24);(:/kdb/..
q.Q))

Or:

q)select high:max(ap),low:min(bp) by ticker from quotes where bp > 0
wsfull

I start up with 26 slave threads, one per segment, but I think that’s the wrong approach; that would start the slaves in-process, and the memory limit is per-process. I think.

$ rlwrap q/l32/q -c 25 200 -s 26

The documentationsays I can start the slaves as slave processes instead by supplying a negative 26 instead, but when I do that I get this:

q)select high:max(ap) from quotes where ticker in AAPLXOMBEAVBWLDTAPBUD,date = 2014.12.24
k){$[~#D;p2[x;:.]':y;(,/p2[x]'/':P[i](;)'y)@<,/y@:i:&0<#:'y:D{x@&x in y}\:y]} '.z.pd @ k){0!(?).@[x;0;p1[;y;z]]}[(quotes;((in;ticker;,AAPLXOMBEAVBWLDTAPBUD);(=;date;2014.12.24));()!();(,0)!,(max;ap))]‘/’:

That .z.pd error brought me to this page which tells me that I have to set up the slaves myself, so I followed the instructions outlined in the “more comprehensive setup” snippet:

q).z.pd
{n:abs system"s";$[n=count handles;handles;[hclose each handles;:handles::u#hopen each 20000+til n]]} q).z.pc {handles::u#handles except x;}
q)handles
u#int$()

But then it tells me this:

q)select high:max(ap) from quotes where ticker in AAPLXOMBEAVBWLDTAPBUD,date = 2014.12.24
k){x’y}
‘hop: Connection refused
@
<:’
20000 20001 20002 20003 20004 20005 20006 20007 20008 20009 20010 20011 20012 20013 20014 20015 20016 20017 20018 20019 20020 20021 20022 20023 20024 20025

So I’m guessing that doesn’t actually start the sub-processes, which is confirmed by the little nugget at the bottom of that reference page: “Note that the worker processes are not started automatically by kdb+.” Also, I don’t see the processes in ps.

I can start slave processes on the correct ports by adapting the code from here:

{value raze “\/path/to/q/l32/q /path/to/slave.q -p “, string x,” &”} each 20000 + til abs system"s"

Where slave.q just loads the database that contains the sym and par.txt for the trades and quotes tables:

\l /kdb/db

Had to add the & so every call doesn’t dump me into a sub-shell. (Why do I need that “raze” ?)

Now my quote queries work:

q)select high:max(ap),low:min(bp) by ticker,date from quotes where bp > 0,ticker in AAPLXOMBEAVBWLDTAPBUD,extime within (09:30:00;16:00:00)
ticker date | high low
-----------------| -------------
AAPL 2014.12.24| 117.88 91.06
BEAV 2014.12.24| 91 54.55
BUD 2014.12.24| 147.99 86.11
BWLD 2014.12.24| 233.7 135.86
TAP 2014.12.24| 89.66 59.63
XOM 2014.12.24| 109.27 67.31

It’s quite slow though. In top I can see each process doing stuff, when really only four segments have the data I want.

Is each slave searching through every segment? How do I tell a slave which segment it should search? Should I bother segmenting and make a bunch of independent databases instead?

I’m running on RHEL 6.2 using:

KDB+ 3.2 2015.01.05 Copyright (C) 1993-2015 Kx Systems
l32/ 32()core 258175MB NONEXPIRE

charlie1 · January 12, 2015, 6:43pm

try to constrain on the sym first (as it has the `p attr)

e.g.

select high:max ap,low:min bp by ticker,date from quotes where ticker in AAPLXOMBEAVBWLDTAPBUD,bp > 0,extime within 09:30:00 16:00:00

Topic		Replies	Views
segmented hdb / 32-bit not enough storage error Community Support kdb-and-q	1	4	March 25, 2015
Limitations On Non-Commercial Version Community Support kdb-and-q	5	9	April 10, 2008
The Memory Management of kdb Community Support kdb-and-q	0	2	July 16, 2012
How to avoid "wsfull" error by process small part of data each time? Community Support kdb-and-q	0	0	April 26, 2011
How to load huge data in KDB Community Support kdb-and-q	6	17	October 5, 2016

using sub-processes to work around 32-bit limit

Related topics