KDB instances using shared memory

Hi,

I tried asking this question on stack-overflow. I am interested in having two (or more) KDB instances using a read-only table loaded into shared memory. My requirement is to load the data(table) once and cache it there. If I were able to put the table into read-only(immutable) mode, multiple readers can use the data without
again loading the data from disk.

Can this be done?

thanks

if you can store your data as a non-compressed splay on-disk, e.g.

:t/ set .Q.en[:.;t]

and load it into each process with

sym:get`:sym

t:get`:t

the pages should be shared between the processes.

Thank you for your reply. If I understood correctly, your solution relies on
OS page caching? 

Is there true shared memory space where I can instruct KDB instances that table X is laid out at the offset Y
and should be treated as effectively read-only?

yes, it uses the page cache, and allows the same physical pages to be shared between processes. What are your concerns in using this method?

Regarding readonly - you can set the kdb+ to be readonly using the command line option

http://code.kx.com/wiki/Reference/Cmdlineb

or implement an access control layer

http://code.kx.com/wiki/Cookbook/AuthenticationAndAccessControl

or revert the data (remap) if changed (monitor with .z.vs)

http://code.kx.com/wiki/Reference/dotzdotvs

?hth,

Charlie?

OS page cache is implicit and could be swapped out by OS depending
on system utilization. I wanted to build an in-memory caching layer where multiple KDB “compute” instances
share table(s) and reduce disk I/O because data is pre-loaded for them. I want to be in full control of
caching and I/O.

Charles, I could discuss my use case offline if you’re interested at all?

Thank you

Hi,

you should  read this article: http://varnish-cache.org/docs/trunk/phk/notes.html

I think it might be interesting for you.

regards, Markus

you can do it but have to dust off your C and OS skills.

serialization/deserialization:(http://code.kx.com/wiki/Cookbook/InterfacingWithC#Serialization.2FDeserialization)

shared mem:

http://man7.org/linux/man-pages/man7/shm_overview.7.html

once you have the name of the shared segment and data is mapped on the right address pass it to the other process by name.

It looks like d9(b9(-1,x) is a (or the only?) nul-safe way of doing deep-copy of types 0, KS, XT, XD using C API? (It is a neat way, albeit expensive.)

Speaking of C API, there is no mention of API for retrieving number of bytes a K object takes up. TorQ’s objsize function is useful from q and it can be called via k(), but it’d be more convenient to have a C function.

>retrieving number of bytes a K object takes up.

there would be a small recursive answer in q using count, type and a map of types to size.  only problem is that where lists share references to other objects, you will find that the total size of all objects sum to greater than process memory usage.  once the q function is known, it can be translate to c - or just call k(0,“{my small size function}”,r1(x)) if you’re in a shared object.

but if you want to know how much memory an object uses in q’s memory allocation method (http://code.kx.com/wiki/DotQ/DotQDotgc), each list size will have to be rounded up to the next larger memory block.

As Jay mentioned, that function is already implemented here: 

https://github.com/AquaQAnalytics/TorQ/blob/master/code/common/memusage.q

It gets a bit complex and has to make some assumptions when dealing with complex objects, and large tables containing nested lists.  There’s a write up of it here: http://www.aquaq.co.uk/q/adventure-in-retrieving-memory-size-of-kdb-object/

It seems to work quite well- I’ve used it in production set ups to measure object size

Cheers

Ryan

https://github.com/AquaQAnalytics/TorQ/blob/master/code/common/memusage.q