Trouble With Huge CSVs

https://learninghub.kx.com/forums/topic/trouble-with-huge-csvs

Greetings All,

I’ve got a couple 40gb CSVs that I’m hoping to perform some joins on.

I do not know the column format, or headers, or if headers are even in the csv.

Im working with a good bit of memory, with 256gb accessible.

Loading the files into memory clearly doesn’t work – as expected the program crashes.

So made my way here (loading from large files page). I understand I’ll have to convert my csvs to splayed tables, save those tables down and then work from there instead of using the csvs.

I’m able to see the rows inside the csv with .Q.fs[0N!]`:file.csv – I still don’t know the entirety of whats inside though.

I go through this little bit,

 

and obviously it’s too big and crashes the program. I try to insert the rows directly into a table on disk with .Q.fs[{:newfile</span> <span class="&quot;token">upsert</span> <span class="&quot;token">flip</span> colnames<span class="&quot;token">!</span><span class="&quot;token">(</span><span class="&quot;token">"DFFFFIS"</span><span class="&quot;token">;</span><span class="&quot;token">","</span><span class="&quot;token">)</span><span class="&quot;token">0:</span>x<span class="&quot;token">}</span><span class="&quot;token">]</span><span class="&quot;token">:file.csv and that crashes too

Should I be chunking this and going from that angle or is there a better way to do this?

 

Yes the w32 version has a limit to how much memory it can address, w64 does not have this restriction.

 

You could also stream the data to an on disk table:

.Q.fs[{`:trade/ upsert flip colnames!("**********";",")0:x}]`:filename
trade:get `:trade/

I’ve chunked with .Q.fs[{trade insert flip colnames!("**********";",")0:x}]:filename and it runs until it crashed.

Did some more research and thought it could be a gc issue, so I added a gc call but that didn’t help me either.

Dumb question, is this bc I’m using w32 instead of w64?