Partition Table Memory Usage

I’m seeing some weird behaviour with querying a partition table (partitioned by month)

This table has 414 columns, so each partition (if we select all columns is about 25 GB). This is a special table (not dealing with this many columns regularly).

rawData:select from myPartitionTable where month in 2022.11m

This query runs almost instantly, and when I check .Q.w, it only uses 2GB. This was surprising to me, as I thought it would load the entire partition into memory given I didn’t select any specific columns, so I was expecting about 25GB of memory usage.

However, I then try to run this:

rawDataFiltered:select from rawData where date>2022.11.03

and I ran out of memory. After playing around with this and increasing the memory limit, it seems like when running this query, it loads the entire partition into memory (my memory usage skyrockets to close to 25GB). Is what I’m describing correct? How does KDB behave with regards to loading partitions into memory?

There is some smart logic behind querying partitioned tables. If you filter on the partition column only, the data is actually not loaded into memory until you actually do some operations on it. So the first query doesn’t have to load all the data, just references to that specific partition. Then the second query does need to read in the data because you are querying an actual column in the table, so now q has to read the filtering column to find which rows match, and since you didn’t specify which columns to keep, it must also load the rest of the columns and filter them.

Thank you! Another related question - after doing the initial query on the partition, when I do 

100#rawData

That still worked (and was fast, and did not increase the memory usage). If you’re saying we are not actually loading the partition into memory, shouldn’t this not work? Is there a way to partially load the columns in the partition?

This blog will have useful information:

Explore the methodology and benefits of kdb+ Memory Mapping | KX