Many thanks, this looks like it might do what I want!
The [n] operator does not work with memory mapped partitioned tables, which persuaded me from trying it before.
But presumably I could just load the relevant columns into an in-memory table and then run the query on that.
My first attempt at doing so:
trips2:(passenger_count:(select passenger_count from trips);total_amount:(select total_amount from trips);trip_distance:(select trip_distance from trips))
Unfortunately this fills up all my memory, and all my swap, and then dies.
Same goes for this second attempt:
pc:select passenger_count from trips
ta:select total_amount from trips
td:select trip_distance from trips
trips2:(passenger_count:pc;trip_distance:td;total_amount:ta)
The original dataset contains 1.43 billion rows and a rough back of the envelope calculation gives me (4 + 1 + 4)byte x 1.43 x 10^9 = 12GiB, so I don’t really know where that extra memory is coming from.
The select queries on the individual columns work as expected, the memory increase only comes when I create the trips2 table.
Even this runs out of memory which makes me think I’m just making some noob mistake:
ta:select total_amount from trips
trips2:(total_amount:ta)
Just running this causes ~4GB in memory growth, which is about 3-4 times higher than I would have expected:
pc:select passenger_count from trips
About 6GB for this which is closer but also strange:
ta:select total_amount from trips
So I hypothesized that select might return data in some results data format that is not super efficient/suitable for creating tables.
Running type gives me:
q)type pc
98h
q)type ta
98h
I haven’t quite figured out what this means yet, but the original types were X and E so this is a bit surprising.
Anyway, it’s getting late so I will resume my investigation tomorrow if I can find the time.
Thanks all for the help so far.