Wondering if anyone has ever tried to implement a dedicated key value store in kdb+, something like levelDB.
I have a situation where users wish to perform a lookup by an alphanumeric string but I don’t know which date partition contains the associated record in advance. Clearly I need to avoid an exhaustive search across all date partitions. If I had a lookup of string to date that would help narrow the search.
I’ve tried using a keyed table, stored as a flat file, but it’s not scalable in terms of memory. I could hold the past months worth in memory and that would satisfy 90% of the queries but I need something more general with constant lookup time. I’d also like to avoid having to introduce another technology
Thanks for the ideas Rian, yes the single anymap file would be too large, but I could try distributing the keys across a set of int partitions, so grouping them in some way, perhaps using a hash. That would reduce the search space. Then I could split the partitions again if they get too big.
Another idea was having a Bloom or Cuckoo filter associated with each date partition, using that to determine if a string is definitely not present in a partition to avoid searching, but it’s not a native feature and I can’t find any examples of people using that.
For the levelDB option, I see that I can compile the C++ library into a shared library and then load that into my q process. At least with that approach I am just writing a wrapper library for the main “Get” and “Put” methods. So that should be relatively quick to test against and use as a benchmark
May not be for you, however worth noting that a really simple method of improving performance is to persist a guid string representation together with the original string. This doesn’t help with regex type queries of course. → hashguid:{0x0 sv md5 x}. If the topic is of interest, see https://dataintellect.com/blog/methods-for-storing-text-data-on-disk-in-kdb/