How to load huge data in KDB

Hi,

I have a table contains 50 million rows. If i use table name it is displaying the data.

But if i use where condition then it is showing wsfull and terminating from the system.

Please find the below query:

select from mytable where name like “amzn” // This query returns more than 10 million rows. But it is returing wsfull Memory issue.

I would like to load this table data in terms of chunks. 

Please help me on this issue

Thanks,

Ram

Hi Ram,

Is the table partitioned/splayed or are you accessing it in memory? Are you using the 32 bit or 64 bit version of kdb+? The 32-bit version is more limited in terms of how it handles large amounts of in-memory data. You can  only use 4GB of RAM with it.

It may be that you have to filter the data further/select less columns if possible. Have you tried using a different where  clause? the like  operator requires kdb+ to read the 50m records into memory for that column to do the search (50m big string might exceed the 4GB of RAM). This is less likely but worth a try.

Can you paste in the meta of the table? Otherwise I can send some sample code that will read in chunks of the table at a time and stream execute so to keep memory consumption low.

Thanks,

Aidan
 

Hi Aidan, Thanks for the response. The table is partitioned and I am accessing it in memory. I am using 64 bit version of kdb. The table has 50+ columns in it and at least I need to select 30+ columns. I am unable to share the metadata of the table. Could you please share the sample code if possible that can use the where clause and read it in chunks and consumes very less memory. Thanks Ram

Hi Ram,

Try querying for less columns -  When you don’t provide those that you are interested in, kdb+ will read all columns into memory, causing the wsfull error

select from mytable where name like “amzn”     -> select name from mytable where name like “amzn”

Thanks,
Connor

Hi Connor, Thanks for your suggestion. However I have a requirement to load at least 30+ columns into the memory. Please let me know if you have any alternate solution to consume less memory. Thanks Ram

Sorry to have you repeat yourself; I missed your earlier reply. What is your requirement exactly? Is this data needed in memory purely to speed-up query time by another process? Reading in chunk by chunk is only an option if, when processing each chunk, you create a (smaller)subset of this data - not really an appropriate solution if your end goal is to have the entire 30 column dataset in memory. Thanks, Connor

Hi Ram,

If you are using the 64bit you should use the licensed portal. You can subscribe here (with your company email address) - https://www.listbox.com/subscribe/?listname=k4
See this thread from Charles@kx - https://groups.google.com/forum/#!searchin/personal-kdbplus/licensed$20k4|sort:relevance/personal-kdbplus/5Ht9q8VWeEE/jt\_KFplANAAJ

As for your query below, if the table is partitioned, you should be providing a partition clause first, before your ‘where name like’
see - http://code.kx.com/wiki/Reference/select#Quick\_Performance\_Tips
Do you really need all partitions selected from?

Why can you not share the meta data? The metadata will just give column names, attributes, types of the columns etc. Nothing ‘of-importance’ per se. A simple copy and paste of ‘meta tab’?
What is your partition domain?
Can you post the banner - the first line of your q proc when you start?

Have you set a workspace limit in your process? Check by .Q.w`wmax
Is your q binary perhaps an alias to q with a workspace size set? Check with alias q or even cat /proc/{pid}/cmdline, etc (if on linux)

check for wsfull reasons here:
Have you enough swap size?
http://code.kx.com/wiki/Errors
Are you compressing?
http://code.kx.com/wiki/Cookbook/FileCompression

Have a read of this thread:
https://groups.google.com/forum/#!topic/personal-kdbplus/ugZ79zjalB4

PS . 10 million rows is tiny :). You should have no issues here loading/selecting that data into memory, providing your hardware and other settings allow it… but to find out what’s going on you are going to have to give more info (and on the k4 forum)

Cheers,
Sean

________________________________________
From: personal-kdbplus@googlegroups.com [personal-kdbplus@googlegroups.com] on behalf of Ram [venkat.pendam@gmail.com]
Sent: 04 October 2016 15:30
To: Kdb+ Personal Developers
Subject: [personal kdb+] Re: How to load huge data in KDB

Hi Aidan,

Thanks for the response.
The table is partitioned and I am accessing it in memory.
I am using 64 bit version of kdb.
The table has 50+ columns in it and at least I need to select 30+ columns.
I am unable to share the metadata of the table.
Could you please share the sample code if possible that can use the where clause and read it in chunks and consumes very less memory.

Thanks
Ram


Submitted via Google Groups