kdb vs sql

Hi,

I would like to know the differences between sql and kdb, the underlying code, and why kdb is “better” than sql for querying significantly large databases of time series data, eg >1,000,000 rows.

I work with kdb, but i know almost nothing about sql, so when I try to promote kdb/q to others as a good way to go for managing large amounts of data, I really can’t say why it is superior, though I understand that it is.

Any figures in terms of speed of query/computation on large tables would be good. As I say I have experience working on kdb so know what to expect when I run queries, but I have no idea what the equivalent would take in sql.

thanks,

John.

Well, it’s not about speed, but it says why q is better than SQL:

http://www.timestored.com/b/kdb-qsql-query-vs-sql/

There’s some really good info/discussions about this on the k4 listbox starting 29th May 2014 with subject “comparing database software”………http://www.listbox.com/member/archive/1080/2014/05/search/Y29tcGFyaW5nIGRhdGFiYXNlIHNvZnR3YXJl/sort/time_rev/page/1/entry/23:24/20140529134732:4CF4F7FA-E759-11E3-9A4E-863963081F01/

 
(Not sure if that link will work for you - you also need to be a member of the k4 group)

Includes sample queries and sample timings, disk usage, memory usage etc across various products

Terry

http://kparc.com/q4/readme.txt

Hi,

Thank you for each response. It appears from the links, kdb is faster and more concise in querying time series databases. Yan, your link is interesting and indicates really the much faster speed advantage over the other database languages.

I’m interested in postgresql, as my boss believes it to be superior stating “postgres is column oriented” and therefore is as fast as kdb, but looking at the result’s from Yan it appears much slower at each of the 4 simple queries, and extremely slow at time joins (kdb’s equivalent asof join). Any thoughts?

-John

regular postgres is not, afaik, a column db, though there appear to be column extensions for it

even then, i would expect it to suffer from the same problems as the column versions of the major commercial traditional db’s (presumably the “big3accel” lines in http://kparc.com/q4/readme.txt) – traditional sql was designed for sets, not lists, so any concept of sequenced data and order-dependent querying has to be bolted on and can’t be optimized as well as in a system built from the ground up for time-series data


plus, you’re still stuck with using sql as the query language, and qsql is simply more powerful


for another example, see http://kx.com/q/e/tpcd.q and http://kx.com/q/e/tpcd.txt, an implementation of part of the old TPCD “data warehouse” benchmark in under a page; the corresponding part of the sql reference implementation http://www.tpc.org/tpcd/spec/tpcd_current.pdf is ~30x longer