Hi guys,
Looking to find some info if KDB+ supports aggregation/roll-up/group-by on non-time fields?
My use case: historic price data, which needs to be compared.
Most queries should happen on the last known price-point of a particular ‘product’ and than roll-up over various dimensions such as product-category and/or brand.
I’ve seen druid.io which seems to be marketed for this case, but it seems a minimal production cluster runs to be pretty costly (ymmv). Moreover, it uses zookeeper which I’ve had a lot of problems with in the past.
Instead, I thought that if I could do this with KDB+ 32bit, possibly splitting the load over multiple processes (which seems possible), I could get away with a much lighter setup. I’ve read somewhere that it is possible to only keep the latest price-point in RAM (which is what most queries use) and do the occasional historic queries from disk/ssd. This should keep me within the 4GB ram * #32 bit processes.
Lastly, perhaps it’s useful to mention that the dimensions along which data should be aggregated are known up front. Not sure if kbd+ has something like materialized views that could help here.
Would this work? Any suggestions more than welcome.
Best,
Geert-Jan