Wondering if anyone has ever tried to implement a dedicated key value store in kdb+, something like levelDB.
I have a situation where users wish to perform a lookup by an alphanumeric string but I don’t know which date partition contains the associated record in advance. Clearly I need to avoid an exhaustive search across all date partitions. If I had a lookup of string to date that would help narrow the search.
I’ve tried using a keyed table, stored as a flat file, but it’s not scalable in terms of memory. I could hold the past months worth in memory and that would satisfy 90% of the queries but I need something more general with constant lookup time. I’d also like to avoid having to introduce another technology
A splayed table on disk with an attribute on a column would be worth testing as these can be mapped rather than requiring to be all in memory
https://code.kx.com/q/ref/set-attribute/#unique
Using 1: to write an Anymap file also creates a mappable object worth exploring
https://code.kx.com/q/releases/ChangesIn3.6/#anymap
If a single splay/anymap would be too large a fixed size int partitioned DB on a fixed range hash of the alphanumeric string could be used
Thanks for the ideas Rian, yes the single anymap file would be too large, but I could try distributing the keys across a set of int partitions, so grouping them in some way, perhaps using a hash. That would reduce the search space. Then I could split the partitions again if they get too big.
Another idea was having a Bloom or Cuckoo filter associated with each date partition, using that to determine if a string is definitely not present in a partition to avoid searching, but it’s not a native feature and I can’t find any examples of people using that.
For the levelDB option, I see that I can compile the C++ library into a shared library and then load that into my q process. At least with that approach I am just writing a wrapper library for the main “Get” and “Put” methods. So that should be relatively quick to test against and use as a benchmark
May not be for you, however worth noting that a really simple method of improving performance is to persist a guid string representation together with the original string. This doesn’t help with regex type queries of course. -> hashguid:{0x0 sv md5 x}. If the topic is of interest, see https://dataintellect.com/blog/methods-for-storing-text-data-on-disk-in-kdb/