Morning all,I’ve got a project that generates around a million rows of time seriesdata per day. The data is all received in order, and typicallycontains simple integers and/or floating point numbers. At present, Iuse MySQL for storage, with a “live” table that holds the realtimedata, and then a series of other tables that holds averaged dataaggregated over various time periods (e.g. hourly, daily, weekly,etc). MySQL is starting to creak under the load, and I’m keen to startlooking at alternatives (particularly as I anticipate that there’sgoing to be hundreds of millions of new rows per day in the next fewmonths).I don’t need to store data at full granularity indefinitely - I’d beperfectly happy with it on a sliding scale, so we store fullgranularity for a short period (e.g. 1 day), all the way back to dailyaverages when looking over 1+ year period. (Not dissimilar to what RRDdatabases provide from RRDTool)Does this sound like a reasonable use-case for KDB+? Would anyonerecommend any open source alternatives too? Or if I’m barking up thewrong tree entirely, plesae say so.Thanks,Sam
Hi Sam,
Yes this is definitely the type of use case which kdb+ is suitable for.
With reasonable hardware, that volume of data should not be a problem. Also, your proposal regarding aggregating the data into various different time buckets can be done pretty easily.
Typically the amount of data that you store historically will be limited by the amount of disk space you have. The amount of intra day data you have will be limited by the amount of physical memory you have.
?
Regards,
Nathan