Re: [personal kdb+] Working with large tables without loading it

you can either partition the data in some fashion,
or subselect from one column at a time.

e.g. from 100 rows, select first 5

q)`:t/ set (a:til 100;b:til 100)

`:t/

q)flip c!{x[z] y}[:t/;til 5]each c:get:t/.d

a b


0 0

1 1

2 2

3 3

4 4

Thank you for your reply.
This method not work for large tables.

5x times executing(got 500m rows):

`:c:/kdb/ upsert (a:til 100000000;b:til 100000000)

and next operation got error:

flip c!{x[z] y}[:c:/kdb/;til 5]each c:get:c:/kdb/.d

I think need something without using get.

Sure, this method works a column at a time, and if a single column is too large to map, then it won’t help you.
Can you partition your data so that a single splay does not exceed these 32bit limits?

Or upgrade to 64bit kdb+… ;-)

This table already partitioned(it’s day’s size), but problem that I wrote much raw data to debug.

Now I disabled additional writing, but I can’t clean existing data…

if the columns are:
not compressed

fixed width types

and have no attributes

you could consider taking the first 16 bytes (the header), and cut the remaining data into chunks, append to the header creating new files which you can then map as usual. No need to adjust the header, as kdb+ will figure out the vector length from the resulting file size.

Unfortunately tables are compressed.
I will be careful with it next time :)

ok, read1 with start,length may be another possiblity - step through, splitting the data into new files that way. That should work with compressed files too.

I tested read1 at last and it helped me!

Thank you again.

At beginning I was confused with Read1, because it return uncompressed bytes.

But at result it very simplified my work.

Second thing I saw that Symbol column header size is 24 bytes instead 16 bytes (it need 24 byte offset for first data chunk).