Computing within table, looping each row

Hi,

I have a historical table that contains date,ticker and price.

Something like this:

date            ticker   price   


2009.04.06 XTZZ   9.265474

2009.04.07 XTZZ   10.65305

2009.04.08 XTZZ   11.73893

2009.04.09 XTZZ   11.63868

2009.04.10 XTZZ   11.48626

2009.04.13 XTZZ   12.57714

2009.04.14 XTZZ   14.28168

2009.04.15 XTZZ   15.07652

2009.04.16 XTZZ   13.20755

2009.04.17 XTZZ   13.40342

2009.04.20 XTZZ   13.79763

2009.04.21 XTZZ   14.54784

2009.04.22 XTZZ   14.91569

2009.04.23 XTZZ   14.53736

2009.04.24 XTZZ   14.53898

2009.04.27 XTZZ   15.57032

2009.04.28 XTZZ   16.87014

I would like to go through each row in the table and for each date,ticker want to get the date of max,min price in the past 15 days. # of days can be any, for now 15 days.

I have something like this:

t: update maxdate:({{[t;r] t:select from t where ticker=rticker,date within -15 0+rdate; exec first date from t where price=max price}'} t) from t

So I will have an output as:

date           ticker       price    maxdate   


2009.04.06 XTZZ   9.265474 2009.04.06

2009.04.07 XTZZ   10.65305 2009.04.07

2009.04.08 XTZZ   11.73893 2009.04.08

2009.04.09 XTZZ   11.63868 2009.04.08

2009.04.10 XTZZ   11.48626 2009.04.08

2009.04.13 XTZZ   12.57714 2009.04.13

2009.04.14 XTZZ   14.28168 2009.04.14

2009.04.15 XTZZ   15.07652 2009.04.15

2009.04.16 XTZZ   13.20755 2009.04.15

2009.04.17 XTZZ   13.40342 2009.04.15

2009.04.20 XTZZ   13.79763 2009.04.15

2009.04.21 XTZZ   14.54784 2009.04.15

2009.04.22 XTZZ   14.91569 2009.04.15

2009.04.23 XTZZ   14.53736 2009.04.15

2009.04.24 XTZZ   14.53898 2009.04.15

This works but looks like it’s taking very very long to compute for a table with about 2 mil rows.

Is there a better/faster way to compute this?

Thanks and appreciate your help!

Hi J,

Initial thoughts were just to use moving max (mmax), take the maxdate where the price=mmax and back filling the blanks with fills. However a simple 3-day max example shows the edge case where this doesn’t work…

q)edge:( date:2009.04.06+til 7; price:3 5 7 2 4 3 7)
q)update movmax:3 mmax price from edge</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif>date price movmax</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif>-----------------------</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif>2009.04.06 3 3</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif>2009.04.07 5 5</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif>2009.04.08 7 7</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif>2009.04.09 2 7</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif>2009.04.10 4 7</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif>2009.04.11 3 4</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif>2009.04.12 7 7</font><br class='""'> <br class='""'> <font class='""' face='"Helvetica,' arial sans-serif>q)update maxdate:date from edge where price=movmax
date price movmax maxdate
----------------------------------
2009.04.06 3 3 2009.04.06
2009.04.07 5 5 2009.04.07
2009.04.08 7 7 2009.04.08
2009.04.09 2 7
2009.04.10 4 7
2009.04.11 3 4
2009.04.12 7 7 2009.04.12

q)update maxdate:fills maxdate from edge
date price movmax maxdate
----------------------------------
2009.04.06 3 3 2009.04.06
2009.04.07 5 5 2009.04.07
2009.04.08 7 7 2009.04.08
2009.04.09 2 7 2009.04.08
2009.04.10 4 7 2009.04.08
2009.04.11 3 4 2009.04.08
2009.04.12 7 7 2009.04.12

However the row 2009.04.11 3 4 2009.04.08 is incorrect - maxdate should be 2009.04.09

A modified version of mmax can be defined to overcome this…

Defining a triadic offsetindex taking n days, table l and a function (mmax or mmin):
offsetindex:{[n;l;f] {[i;s;m;o] @[i;where m=s xprev o;:;s]}[;;f[n;l];l]/[(count l)#0N;til n]}

offsetindex calculates the moving min/max, but returns a result in terms of the number of indices back to where that price occurred, for example:
q)offsetindex[5; 6 5 3 7 7 4 3 8 7 9 6; mmax]
0 1 2 0 1 2 3 0 1 0 1

Creating some sample data:
t: (( date:2009.04.06 + til 20000) cross ( ticker:AB))
update price:40000?10 from t</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> Then updating the maxdate column with the date, taken as that index minus the result from offsetindex:</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> update maxdate:date (til count ticker) - offsetindex[1000;price;mmax] by ticker from t</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> </font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> </font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> Applying to our sample data:</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> q)\t update maxdate:date (til count ticker) - offsetindex[15;price;mmax] by ticker from t</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> 15</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> </font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> Benchmarking against your original function:</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> q)\t update maxdate:({{[t;r] t:select from t where ticker=rticker,date within -15 0+rdate; exec first date from t where price=max price}[x]'[x]} t) from t</font><br class='""'> <font class='""' face='"Helvetica,' arial sans-serif> 4665<br class='""'> <br class='""'> <br class='""'> <br class='""'> A few caveats to point out for offsetindex. The table must be contiguous because the function looks back 15 values, rather than 15 dates. If it must be 15 dates, then you must pad out the table with the missing dates.<br class='""'> <br class='""'> The offsetindex function must be used with +1 days more than your function to return the identical solution, based upon the inclusion or exclusion of the nth date:<br class='""'> <br class='""'> / your function is passed 15<br class='""'> q)t1:update maxdate:({{[t;r] t:select from t where ticker=rticker,date within -15 0+rdate; exec first date from t where price=max price}[x]'[x]} t) from t<br class='""'> <br class='""'> / offsetindex is passed 16<br class='""'> q)t2:update maxdate:date (til count ticker) - offsetindex[16;price;mmax] by ticker from t<br class='""'> <br class='""'> / Results are identical<br class='""'> q)t1~t2<br class='""'> 1b<br class='""'> <br class='""'> <br class='""'> <br class='""'> Let's take a look at how the performance of both functions scales with the 40,000 row sample data...<br class='""'> <br class='""'> For a 100 day lookback:<br class='""'> q)\t update maxdate:date (til count ticker) - offsetindex[100;price;mmax] by ticker from t<br class='""'> 68<br class='""'> q)\t update maxdate:({{[t;r] t:select from t where ticker=rticker,date within -100 0+rdate; exec first date from t where price=max price}[x]'[x]} t) from t<br class='""'> 4722<br class='""'> <br class='""'> For a 1000 day lookback:<br class='""'> q)q)\t update maxdate:date (til count ticker) - offsetindex[1000;price;mmax] by ticker from t<br class='""'> 657<br class='""'> q)q)\t update maxdate:({{[t;r] t:select from t where ticker=rticker,date within -1000 0+rdate; exec first date from t where price=max price}[x]'[x]} t) from t<br class='""'> 5144<br class='""'> <br class='""'> <br class='""'> And increasing to 80,000 row sample data...<br class='""'> <br class='""'> q)\t update maxdate:date (til count ticker) - offsetindex[15;price;mmax] by ticker from t<br class='""'> 28<br class='""'> <br class='""'> q)\t update maxdate:({{[t;r] t:select from t where ticker=rticker,date within -15 0+r`date; exec first date from t where price=max price}'} t) from t
17105


As you can see, the performance of offsetindex degrades with increasing values of lookback days, whereas it scales well with increasing table sizes.



Thanks,
Matthew
KDB+ Developer, AquaQ Analytics Ltd



Thank you very much Matthew! 

This is very helpful! Really appreciate your help!

Thank you,

J

In short, finding max price dates in a sliding window of 6 rows, this works:

q)select date,price,top:date w[{x?max x};6;price] from t

date       price    top

------------------------------

2009.04.06 9.265474 2009.04.06

2009.04.07 10.65305 2009.04.07

2009.04.08 11.73893 2009.04.08

2009.04.09 11.63868 2009.04.08

2009.04.10 11.48626 2009.04.08

2009.04.13 12.57714 2009.04.13

2009.04.14 14.28168 2009.04.14

2009.04.15 15.07652 2009.04.15

2009.04.16 13.20755 2009.04.15

2009.04.17 13.40342 2009.04.15

2009.04.20 13.79763 2009.04.15

2009.04.21 14.54784 2009.04.15

2009.04.22 14.91569 2009.04.15

2009.04.23 14.53736 2009.04.22

2009.04.24 14.53898 2009.04.22

2009.04.27 15.57032 2009.04.27

2009.04.28 16.87014 2009.04.28

Where w is:

q)w:{(til[count z]-m)+x each flip reverse prev[m:y-1;z]}

In long, there’s a nice tip about ‘sliding windows’ here:  http://code.kx.com/wiki/Cookbook/ProgrammingIdioms that defines a function fwv.  

fwv finds all the windows using prev iterate.  To find windows of size 3:

q)prev[2;x:3 5 7 2 4 3 7]
3 5 7 2 4 3 7
3 5 7 2 4 3
3 5 7 2 4

q)flip reverse prev[2;x]

3

3 5

3 5 7

5 7 2

7 2 4

2 4 3

4 3 7

q)max each flip prev[2;x]

3 5 7 7 7 4 7

q)(3 mmax x)~max each flip prev[2;x]

1b

To index:

q)w:{(til[count z]-m)+x each flip reverse prev[m:y-1;z]}

q)x w[{x?max x};3;x]

3 5 7 7 7 4 7

So w can index the date column as shown at the top of this mail.

Ta, Jack

Hi Jack,

My example uses the same approach with sliding window sizes, but prev matrix doesn’t scale well with window size when considering memory usage size. Comparing it here to my previously suggested function, using some sample data t:

q)t: (( date:2009.04.06 + til 20000) cross ( ticker:AB))
q)update price:40000?10 from `t

q)offsetindex:{[n;l;f] {[i;s;m;o] @[i;where m=s xprev o;:;s]}[;;f[n;l];l]/[(count l)#0N;til n]}
q)w:{(til[count z]-m)+x each flip reverse prev[m:y-1;z]}

For window size 10:
q)\ts update maxdate:date (til count ticker) - offsetindex[10;price;mmax] by ticker from t
10 2360256
q)\ts select date,price,top:date w[{x?max x};10;price] from t
24 10364048


For window size 1000:
q)\ts update maxdate:date (til count ticker) - offsetindex[1000;price;mmax] by ticker from t
650 2360256
q)\ts select date,price,top:date w[{x?max x};1000;price] from t
6927 851977232


And confirming both functions give the same result:
q)(update maxdate:date (til count ticker) - offsetindex[10;price;mmax] by ticker from t) ~ update maxdate:date w[{x?max x};10;price] by ticker from t
1b



You could also optimise your function slightly (although you still face the scaling problems with prev):

q)x:3 5 7 2 4 3 7
q)(max prev[2;x]) ~ max each flip prev[2;x]
1b
q)\ts:1000 max each flip prev[2;x]
4 1248
q)\ts:1000 max prev[2;x]
1 768

This will work for max, min and other built in functions, but for the general case, f each flip is still required.


Thanks,
Matthew
KDB+ Developer, AquaQ Analytics