I’ve got a process doing some calculations on a timer and sending updated table to another process. Its heap is more than 3x of used even after manual trigger of .Q.gc.
key
value
used
567774096
heap
1946157056
peak
2617245696
I’m using KDB+ 4.0 2021.04.26
Is memory fragmentation the only cause to it? How do I find which operation contributes to it the most?
Are there any other cases when kdb accumulates its internal memory or known bugs leading to memory leaks?
As a first step you could insert printouts of .Q.w in between the actual operations in the query, even breaking down expressions into single operator invocations if necessary. Additionally .Q.ts can be used to figure out the time and space used by an operation, similarly to ts but it also returns the result (it is parameterized like . (dot) for multi-parameter apply).
The previous comment of using .Q.w is a good start for isolating what part of the calculations are memory intensive and requiring a large heap allocation by the OS. Printing to standard out using 0N! after each expected memory intensive line will isolate that point in your code.
On the more under-the-hood side, this article by AquaQ is quite helpful to help understand. But to summarise and add some additional points:
KDB allocates memory in powers of two. Meaning a vector of data will be placed in a memory block one power of 2 up from the raw data, leading to at most 2x memory used.
Memory fragmentation may also be an issue depending on your aggregations - example here
The Q process starts with a certain amount of heap allocation that is larger than the used space (this can be seen by starting a Q session and running .Q.w[] straight away). The process won't go below this heap allocation by the OS on startup.
If you don't think that a combination of these points contributes enough to cause the heap to be this much larger than used after calling .Q.gc[] I'd recommend invoking the script from the timer manually and investigating with .Q.w from there, as the heap does appear rather large even given the above. This would eliminate the issue of running garbage collection, and the timer function running again while investigating with .Q.w causing the numbers to be misleading.
As you can see in trying to replicate your issue, my example releases the expected amount of memory back to OS. Due to the number of records you have and the relative size of the table after, the issue I think you’re encountering is due to the data structure of position leading to memory fragmentation. As per my other reply the reference on code kx gives an example of this stating “nested data, e.g. columns of char vectors, or much grouping” will lead to fragmenting memory heavily, does this reflect your data?
To fix this I’d suggest the approach on the reference of serialise, release, deserialise. Or to extend further to your case: serialise, release, deserialise, release, IPC reassign, release. This will maintain a low memory footprint and try to remedy the memory fragmentation but you may still unavoidably have heap greater than used purely due to the data structure (however to a lesser extent to what you’re experiencing).
If memory fragmentation isn’t the cause can you give a bit more insight on the data structure of position as my attempt to replicate shows this problem is data specific.
might be worth checking if the objects are <64MB too
"During that return of memory, q checks if the capacity of the object is ?64MB. If it is and g is 1, the memory is returned immediately to the OS; otherwise, the memory is returned to the thread-local heap for reuse.
I wasn’t able to replicate the issue on my local machine running on KDB+ 4.0 2020.07.15:
My heap returned back to the level it was at the start of the Q session on release as expected.
However I was able to recreate the issue running KDB+ 4.0 Cloud Edition 2022.01.31.
So the issue seems to lie with QCE releasing back to OS. I’ll follow up internally on this to see if it’s a known issue and what can be done to minimise the heap used.
However, per the screenshot I wasn’t able to recreate the re-assigning of position via IPC call not lowering heap after running .Q.gc (heap is the same after GC and re-assigning this as initial assign and GC).
As a potential fix to this can you try before your second assignment of position purging it from memory:
My table has 54 columns of various simple types, mainly floats, symbols, ints and timestamps. Each column is of around 2Mb in size.
I can reproduce it with your code by dropping n to 2000000, which makes columns similar in size to my case. .Q.gc does not help releasing the excess heap to the OS:
To replicate the issue please copy position table twice, like you did with the cloud edition. It’s the second copy that takes and not releases the memory. I’m not running a cloud edition but the windows version:
My theory is that the first copy creates the object in the first 64Mb block. For the second invocation of h"position" it had to create the second block and then assignment repoints the columns from the first to the second block. But because the first block has other objects already it cannot be freed. When the process is constantly updating this position table and at the same time serves other queries this situation repeats over and over slowly leading to a memory fragmentation that appears as a memory leak.
Is it possible to control the minimum block from command line? So knowing that a process is frequently creating “small” objects I could start it with 1Mb minimum block size instead of 64Mb?
Understood on the QCE version not being an issue. So in my initial response to this I wasn’t able to replicate the issue with n:50000000, if you look at that you see I call position twice and the heap returns to normal.
For n:2000000 I see the issue however so on the same page now:
Regardless, did you try my fix I suggested in the latest response - as it works for both QCE and Q:
See how if I delete position from the local namespace before reassigning it the heap returns to normal after GC.
I think your theory about the first block allocation then second block use on second IPC call is correct. The reason I didn’t see this for the n=50000000 case was because the data was of a size that the memory allocated was large enough to hold both the IPC read and what was currently in memory without allocating another block. For the data you’re using or the n=2000000 case the memory allocated was nearer to the amount taken up by the object in memory.
So my solution of deleting from the local namespace before calling again reduces the used memory in the process enough to be able to contain the second assignment and stop the invocation of the second block. Important to note that if you delete from the local namespace immediately before the second assignment this shouldn’t affect your code since the reassignment would overwrite the variable anyway.