Optimal way of doing X-minute bars over a multi-day span

Hello!

I’m new to this KDB world, so I’m looking for a little help trying to optimize/replace a query.

I’m developing a charting application that allows traders to look at data from several days or months, so I’m trying to hit the database in just one query to extract the data that I need.

For example, if I wanted to extract 60-minute bars for the whole month of October from the database I would use the following query:

{[start;end] dateminute xasc raze {dateminute xkey update date:x from select first price,max price,min price,last price,sum volume by 60 xbar time.minute from TABLE where date=x,ex=EXCHANGE,sym=SYMBOL} peach (start + til (end-start) + 1)}[2013.10.01;2013.10.31]

What are your thoughts on that query? Any tip for improving it?

Thanks,

Diego

It depends on your purpose but you might want to use 0D60 xbar time to maintain the time information.

Swap ex and sym in the order of the where clause. If you already use -s on the HDB, there’s no need for the peach - a "date in " does the same thing.

do you have parted attribute set on sym? If so that should probably be your first constraint after date. Also can use within.

{[start;end] select first price,max price,min price,last price,sum volume by date,60 xbar time.minute from TABLE where date within(start;end),sym=SYMBOL,ex=EXCHANGE}[2013.10.01;2013.10.31]

Hey guys, thanks a lot for the help, I’m using Charles’ suggestion and it seems to be faster than my original query.

About your other suggestions, since I am still kind of new about KDB, I have some questions for you:

  1. What does “0D60 xbar time” do?

  2. What does the option -s do? I ask the guy in charge of KDB in the office and he didn’t know.

  3. What does “parted attribute” stand for?

Thank you very much,

Diego

>1. What does “0D60 xbar time” do?

0D60 is shorthand for a timespan of 60 hours - better not to rely on exceeding 23 hours in that field. Huy probably meant 0D01, but you’d also have to cast the rhs to timespan for that to work. It floors to multiples of that interval. Used for bucketing.

q)0D01 xbar `timespan$05:02:01.123

0D05:00:00.000000000

>2. What does the option -s do?

It’s a command line option that specifies how many slave threads kdb+ should use for “parallel each” (peach) operations. kdb+ can use multithreading to speed up queries in hdbs in this mode.

>3. What does “parted attribute” stand for?

Attributes can be supg - sorted, unique, parted, grouped; each one can allow optimizations within certain operations. Parted means the unique elements are grouped together in the vector.

e.g. q)`p#1 1 5 5 3 3

Usually hdb data is sorted by sym, then time. This lends itself to having the `p attribute on sym.

code.kx.com is the best and most accurate reference for this info, along with Arthur’s original documents

http://kx.com/q/d/q.htm

http://kx.com/q/d/kdb+.htm

And Dennis Shasha’s

http://kx.com/q/d/primer.htm