I would like to know the differences between sql and kdb, the underlying code, and why kdb is “better” than sql for querying significantly large databases of time series data, eg >1,000,000 rows.
I work with kdb, but i know almost nothing about sql, so when I try to promote kdb/q to others as a good way to go for managing large amounts of data, I really can’t say why it is superior, though I understand that it is.
Any figures in terms of speed of query/computation on large tables would be good. As I say I have experience working on kdb so know what to expect when I run queries, but I have no idea what the equivalent would take in sql.
Thank you for each response. It appears from the links, kdb is faster and more concise in querying time series databases. Yan, your link is interesting and indicates really the much faster speed advantage over the other database languages.
I’m interested in postgresql, as my boss believes it to be superior stating “postgres is column oriented” and therefore is as fast as kdb, but looking at the result’s from Yan it appears much slower at each of the 4 simple queries, and extremely slow at time joins (kdb’s equivalent asof join). Any thoughts?
regular postgres is not, afaik, a column db, though there appear to be column extensions for it
even then, i would expect it to suffer from the same problems as the column versions of the major commercial traditional db’s (presumably the “big3accel” lines in http://kparc.com/q4/readme.txt) – traditional sql was designed for sets, not lists, so any concept of sequenced data and order-dependent querying has to be bolted on and can’t be optimized as well as in a system built from the ground up for time-series data
plus, you’re still stuck with using sql as the query language, and qsql is simply more powerful