Pretty interesting article. I had one or two questions that I was wondering if anyone had some insights on and one additional observation that I think is relevant to leveraging the advice.
Questions: If the 1: method of writing animals reduces the copying overhead for Anymap, what’s the reason that the set method is used? (Just avoiding re-reading mapped dat from disk maybe?) And is there any way to externally observe which method an Anymap file has been written with?
Observation: Although mapping a splay in deferred mode doesn’t immediately read the data, it is not totally accurate (at least on Linux) to say that the data is mapped and unmapped every time it is accessed. Rather, the mapping is created when the variable is assigned and any reads deriving from the assigned variable will be against the same map. If the map is not disposed of, the VM subsystem will treat the mapping whatever way way it decides to treat it. For this reason, if you are using a large mappable file across a large number of operations, you should be very careful about when you actually create and delete the maps. What we found was that creating the mapping immediately before reading and using the 5-argument select was the best way to ensure we minimized our memory impact without adding too much additional overhead (there were extra map/unmap calls, but they were a marginal cost in our application which was reading a few gigabytes of data at a time from a map that was a fair bit larger than >1TB)