I’m fairly new to databases. My current system uses hdf5 and is really starting to hit its limit.
I have roughly 15 years of data for 1000 securities. Each security has ~100 fields of varying frequency:
Daily: Open, Close, Adjusted close, Volume, VWAP, …
Weekly: Short interest, Market cap, …
Monthly: Industry/Sector, Financial statement data, …
There will additionally be a large number of values that are calculated from this data (moving averages, ratios of fields, etc…).
The most common query will be for 30-50 fields re-sampled to a given frequency.
What should the tables look like? Is okay to have 100+ columns in a table or should I spread them out somehow?
it’s fine to have many columns - whatever naturally fits
maybe start by directly export/import csv’s
1bbe31a6-047e-4f98-b7f5-da91f05e9e15@googlegroups.com>To: Ryan Turner , personal-kdbplus@googlegroups.com
Hi Ryan,
It’s totally fine to have 100+ columns in a table. Column store means you only access the columns in a particular query, even if it is 30 or 40% of them.
You should start with all splayed tables (one per data frequency) since it doesn’t sound like much data for kdb.
Your monthly and weekly tables could be sparse daily and filled or aj to the desired frequency.? That would be more flexible than, say, forcing market cap to be weekly when really it changes daily and you might want to track it as such for some large cap stocks or something. Or maybe one company releases earnings annually so doesn’t fit the monthly scheme you’re assuming. You can easily convert between frequencies in kdb.
Good luck,
David
From: Ryan Turner
Sent: Tuesday, June 16, 2015 19:33
To: personal-kdbplus@googlegroups.com
Reply To: personal-kdbplus@googlegroups.com
Subject: [personal kdb+] Storing Price + Fundamental Data
|
I'm fairly new to databases. My current system uses hdf5 and is really starting to hit its limit.
I have roughly 15 years of data for 1000 securities. Each security has ~100 fields of varying frequency:
Daily: Open, Close, Adjusted close, Volume, VWAP, ...
Weekly: Short interest, Market cap, ...
Monthly: Industry/Sector, Financial statement data, ...
There will additionally be a large number of values that are calculated from this data (moving averages, ratios of fields, etc...).
The most common query will be for 30-50 fields re-sampled to a given frequency.
What should the tables look like? Is okay to have 100+ columns in a table or should I spread them out somehow?
--
Submitted via Google Groups