The KDB + memory mechanism is what?

huangyuanfei14 · January 12, 2014, 12:22pm

If I need 2 terabytes of data to calculate.But I only 100 Gb of memory, How to solve the problem of large data than the memory?The KDB + memory mechanism is what?

Thanks

wp1 · January 12, 2014, 12:30pm

assuming you are dealing with timeseries data, you should partition your data by date/month/year.
Have a look at kdb for mortals:?http://code.kx.com/wiki/JB:KdbplusForMortals/partitioned_tables

It covers the basics databases on disk quite extensively.

AE1 · January 15, 2014, 4:48pm

Hello,

I have a question related to the main question.

For example, I have large amount of historical data, which does not fit in memory.

hst:([obj_id:4020 4050 4050] change_date:2012.09.13 2012.09.12 2012.09.20; change:(“10->20”; “22->33”; “33->55”)) / and more data

select count distinct obj_id from hst

~10^7

count hst

~20^8

change_date interval:5 years.

I need ability to retrive list of changes for every object fast. It is clear and we cannot split it by day or month, because we have to read all partitions in this case.

If we have it partitioned by object - there will be tons of small files of partitions - I am not sure if it is ok.

Are there any other solution except duplicate 2 databases if I need both abilities: to retrieve history for one object and retrieve history for the period?

Thank you,

? Alexander.

pressjonny0 · January 20, 2014, 9:08am

Hi

If you are working with 32bit kdb+, then it is going to be very difficult to stick inside the 4GB memory limit without introducing some form of partitioning. I wouldn’t think you could partition by object ID without exceeded file system limitations, but you could maybe partition by objectID group (e.g. objects 0 -> 10k live in one partition, 10k -> 20k live in another etc.). Assuming there are 26 bn records, 10m object IDs, that would give 2500 records per object (approx). There might be a balance you could find where you could work sensibly with the data, maybe objectID groups of 10k IDs, which would give 1000 partitions with 25m rows in each, and you might be able to work with that without hitting the 4GB memory limit too often (assuming you have the data with `p# on object ID and you use that as your first lookup field everytime).

If you have 64bit then you have lots more options. For example, you could save it as one large splayed table with a `p# on obj_id, but that might be difficult to manage in itself (hard to sort etc.).

Thanks

Jonny

Topic		Replies	Views
How to query data in KDB that is too large to fit in memory? Community Support kdb-and-q	7	43	March 27, 2019
How to efficiently store historical data registered in memory(RDB). Community Support kdb-and-q	2	11	July 1, 2022
Partition Table Memory Usage Community Support kdb-and-q	3	23	August 24, 2023
Partition Table Memory Usage Community Support imported , kdb-and-q	3	26	August 24, 2023
Capacity planning Community Support kdb-and-q	3	8	April 3, 2009

The KDB + memory mechanism is what?

Related topics