xasc gives wsfull - is there any workaround?

Hi All,

I’m running a 32bit version, 

KDB+ 3.3 2015.08.23 Copyright (C) 1993-2015 Kx Systems

l32/ 4()core 7893MB 

Using TorQ WDB and TickerLogReplay - both crash on sorting my data with wsfull. I’m sorting a quote table of about 70M records.

The problem can be reproduced by the following. I don’t think kdb does it differently for on-disk sorting, otherwise I won’t be getting wsfull.

q)\ts t:(a:100000000?1.0;b:100000000?1.0)

1283 2147484192

q)`a xasc t

wsfull

Is there a workaround? How can I sort a table without loading the whole column into memory?

Thanks

Nick 

I’ve been looking into this a bit further and I don’t quite understand why kdb sort algo consumes multiples of the original memory taken by a column.
E.g. my sym column in an enumerated splayed table takes 260Mb.

splayedtable: `:/path/to/table

\ts asc value splayedtable`sym

1759 2147483936

\ts iasc value splayedtable`sym

728 1610744064

So it takes 2.1Gb of memory to sort a 260Mb vector of ints? A bit less (1.6Gb) to produce indexes for sorting but still way too much.

The `time column takes twice as much bytes and can’t be sorted at all.

\ts asc splayedtable`time

wsfull

Is there any in-place sort algo available?

Thanks

Nick

KDB doesn’t have any other implicit sorting mechanism . You need to define your own function for that.

For On-Disk sorting, take a look at following. This will give you some ideas to start:

http://code.kx.com/wiki/Reference/xasc#Sorting\_data\_on\_disk
http://code.kx.com/wiki/JB:KdbplusForMortals/splayed\_tables#1.2.5.3\_Sorting\_by\_a\_Column\_on\_Disk
https://groups.google.com/forum/#!msg/personal-kdbplus/kho3unfJ9uc/7UdOBGKKigkJ

Please check the Note on the page about the issues that could occur during on disk sorting.

Hi Rahul,

Thanks for your reply. Can’t see any pointers in the links. The code there is based on xasc or iask which fails to run itself, the last link confirms that there’s a problem with memory consumption, perhaps it’s just a trade off for speed.

So what are my options?

  • Implement custom sort in q and/or k.

  • Break the table into a few parts with distinct keys, sort each part and then merge.

  • Use Kona to sort the table, never used Kona before, not sure if there’s a built in sort.

  • Implement a custom table sorter in C/C++ using structures from C interface provided by KX, will require some reverse engineering I guess.

  • Buy a licence from KX.

What would you choose and why?

Thanks!