Issue with Window Join

https://learninghub.kx.com/forums/topic/issue-with-window-join

Hi,

 

I intended to write this as well in my previous post but decided to do so in a new post here as it concerns the Practical Guidance - Joins notebook.

 

In the Practical Guidance - Joins notebook:

show t:(sym:3#`JPM;time:09:30:01 09:30:04 09:30:08;price:120 123 121) is:

sym time     price 
------------------ 
JPM 09:30:01 120   
JPM 09:30:04 123   
JPM 09:30:08 121

show q:(sym:10#`JPM;time:asc 09:30:00+10?8;ask:10?90+til 20;bid:10?90+til 20) is:

sym time     ask bid 
-------------------- 
JPM 09:30:00 102 107 
JPM 09:30:00 91  100 
JPM 09:30:01 93  93  
JPM 09:30:02 102 108 
JPM 09:30:03 91  93  
JPM 09:30:04 96  108 
JPM 09:30:05 102 106 
JPM 09:30:05 103 99  
JPM 09:30:07 97  101 
JPM 09:30:07 95  102

t[`time] is:

09:30:01 09:30:04 09:30:08

show windows: -2 1+: t[`time] is:

09:29:59 09:30:02 09:30:06 
09:30:02 09:30:05 09:30:09

wj[windows;symtime;t;(q;(max;ask);(min;bid))] is:

sym time     price ask bid 
-------------------------- 
JPM 09:30:01 120   102 93  
JPM 09:30:04 123   103 93  
JPM 09:30:08 121   103 99

 

I am unable to figure out why the result of the window join has 103 for ask and 99 for bid in the last row, instead of 97 for ask and 101 for bid.

My comprehension is as follows:

To determine both the ask and bid values in the last row, we need to look at the rows in q where the time values fall between 09:30:06 and 09:30:09 inclusive. Only the last two rows satisfy this condition.

We then determine the higher ask value (97) and the lower bid value (101) among these two rows.

 

This approach works for the first two rows in the result of the window join.

(For the first row, look at the rows in q where the time values fall between 09:29:59 and 09:30:02 inclusive. Only the first four rows satisfy this condition. Then determine the highest ask value (102) and the lowest bid value (93) among these four rows.

For the second row, look at the rows in q where the time values fall between 09:30:02 and 09:30:05 inclusive. Only the fourth to the eighth rows satisfy this condition. Then determine the highest ask value (103) and the lowest bid value (93) among these five rows.)

 

It is my first time learning about the window join and I am uncertain if my approach is right.

 

Once again, I really appreciate any assistance or clarification with this issue.

 

Thank you very much.

Hi , just to add to 's reply;

  • It's because wj uses the last-most quote out side of the window as well as quotes in the window (that last-most quote is called the "prevailing" quote in the notes). If this is not the desired result and you strictly want stuff in the window only, you use wj1
If I use your example to explain:
t:([]sym:3#`JPM;time:09:30:01 09:30:04 09:30:08;price:120 123 121) 
q:([]sym:10#`JPM;time:asc 09:30:00,09:30:00,09:30:01,09:30:02,09:30:03,09:30:04,09:30:05,09:30:05,09:30:07,09:30:07;ask:(102,91,93,102,91,96,102,103,97,95);bid:(107,100,93,108,93,108,106,99,101,102);rn:1+til 10) 
windows: -2 1+: t[`time] 
wj[windows;`sym`time;t;(q;(max;`ask);(min;`bid);((::);`rn))]
  • That last-most quote only is taken when there is no quote right at the exact window start time (you can see this in the second row of the result, we had a quote exactly at window start time 09:30:02, so the last-most quote outside the window isn't included in that case, for the third row it was because no quote exactly at 09:30:06)

Hi,

For wj, the prevailing quote on entry to the window is considered valid as quotes are a step function.

See this link for more details.

Window join | Reference | kdb+ and q documentation - Kdb+ and q documentation (kx.com)