We have a setup where we need to run a set of calculations on a very large set of data repeatedly, varying some parameters each time.
The data is so large that we can only run a subset of the data at any one time and need to process the various chunks in sequence.
Due to the need to load the data into memory each time, a full run takes a lot of time. Storage is already very speed optimized. The machines are huge with Terabytes of memory.
Our only options appear to be adding more memory, more huge machines or finding more improvements for the storage system. All rather costly.
Are there other options where we could distribute kdb+ on a set of much smaller, albeit more, machines which would in sum have to do less loading and make it faster overall? The actual calculation is not time critical at all compared to the total compute time at this point.
Anyone have any experience with a similar situation? thx
Just what I was looking at : https://github.com/jaeheum/qzmq - Jay Han added bindings in q for zero MQ(www.zeromq.org).
I was only toying with this thought, and the idea looked promising.
Simply put, Zero MQ acts as an in-memory store for messages, each message identified by a unique ID.
What you could have is multiple ZeroMQ instances that you could load different chunks of your *big* data into(e.g: (chunk 1-10, ZeroMQ_Instance_1), (chunk 11-20, ZeroMQ_Instance_2))…). And then you would need “consumer” programs to consume data from these queues, and at the end of the entire operation, synchronize and write back.
That way, your data is distributed and the programs acting on these chunks do so in parallel, and the ZeroMQ instances manage IDs for you.
Another option is memcached, but I’m not sure of the q bindings for them.
Thank you Krishna for the plug :-) I will jot down a few more suggestions with a HUGE caveat: There are usually subtle and surprising details about large data’s shape, momentum, etc that can impact computation greatly and I don’t know what they are for HC’s data. For example, how shardable is the data? Do parameters for calculations depend on data, parameters used in previous calculation, or processed output, etc? I am assuming there are profiles of the current computation? Does current computation take advantage of all cores? …
Here’s an incomplete list of possible mechanisms currently available:
peach, IPC, C FFI, etc
*many* whitepapers from FD/kx
qzmq
mobile code (on top of IPC, qzmq, shared memory/storage, etc), assuming code weight << data weight
memcached as already mentioned by Krishna – I wonder what an acceptable subset of libmemcached’s large API will look like?
“cloud solutions” (a mandatory consideration these days)
It must be nice to have machines with terabytes of memory! :-)