Re: [personal kdb+] Working with large tables without loading it

charlie1 · August 11, 2014, 9:54am

you can either partition the data in some fashion,
or subselect from one column at a time.

e.g. from 100 rows, select first 5

q)`:t/ set (a:til 100;b:til 100)

`:t/

q)flip c!{x[z] y}[:t/;til 5]each c:get:t/.d

a b

0 0

1 1

2 2

3 3

4 4

skuvvv1 · August 13, 2014, 10:09am

Thank you for your reply.
This method not work for large tables.

5x times executing(got 500m rows):

`:c:/kdb/ upsert (a:til 100000000;b:til 100000000)

and next operation got error:

flip c!{x[z] y}[:c:/kdb/;til 5]each c:get:c:/kdb/.d

I think need something without using get.

charlie1 · August 13, 2014, 11:23am

Sure, this method works a column at a time, and if a single column is too large to map, then it won’t help you.
Can you partition your data so that a single splay does not exceed these 32bit limits?

Or upgrade to 64bit kdb+ ;-)

skuvvv1 · August 13, 2014, 12:00pm

This table already partitioned(it’s day’s size), but problem that I wrote much raw data to debug.

Now I disabled additional writing, but I can’t clean existing data…

charlie1 · August 13, 2014, 12:31pm

if the columns are:
not compressed

fixed width types

and have no attributes

you could consider taking the first 16 bytes (the header), and cut the remaining data into chunks, append to the header creating new files which you can then map as usual. No need to adjust the header, as kdb+ will figure out the vector length from the resulting file size.

skuvvv1 · August 16, 2014, 1:41pm

Unfortunately tables are compressed.
I will be careful with it next time :)

charlie1 · August 16, 2014, 10:43pm

ok, read1 with start,length may be another possiblity - step through, splitting the data into new files that way. That should work with compressed files too.

skuvvv1 · August 22, 2014, 3:21pm

I tested read1 at last and it helped me!

Thank you again.

At beginning I was confused with Read1, because it return uncompressed bytes.

But at result it very simplified my work.

Second thing I saw that Symbol column header size is 24 bytes instead 16 bytes (it need 24 byte offset for first data chunk).

Topic		Replies	Views
Can't read splayed tables bigger than 38 GB Community Support kdb-and-q	3	13	March 11, 2015
Working with large tables without loading it Community Support kdb-and-q	0	13	August 10, 2014
How to load huge data in KDB Community Support kdb-and-q	6	36	October 5, 2016
Table size Community Support kdb-and-q	3	26	January 20, 2017
Trouble With Huge CSVs Community Support kdb-and-q	3	10	July 18, 2021

Re: [personal kdb+] Working with large tables without loading it

Related topics