A very tricky question for Gurus

Hi q-Gods, 

Please, help needed! I know what I need to do but I have no idea how to do it in q. I have given my absolute best and have spent 2 days trying, but nothing seem to work out. 

I have a table t with an empty RM (running minimum) column which needs to be filled with running minimum of col “px”

record id acn  px RM

0        1   1    15

1        2   1    20

2 3 1 10

3 4 1 11

4 3 0 10

5        5   1    13 

6 4 0 11

7        6   1    17  

..        …  ..    …

Each ID in the table can have 2 acn’s, either acn=1 (order submitted) or acn=0 (order cancelled). I am looking to run a running minimum in RM column. However, whenever an ID has acn=0, the the according price is no longer registered in the system. In this case, next minimum takes its place. Allow me to demonstrate with table t.

“Record” table is just an index of each row/record and is used purely this example, I do not have have this col in my table t. 

record id acn  px RM

0        1   1    15  15

1        2   1    20  15

2        3   1    10  10

3        4   1    11   10

Record 4 says: ID=3 has acn=0, which means this orders has been cancelled and is no longer in the system. Its px is ignored from here on. This means that the next-in-order-minimum is 11. Of course to find this next in order minimum, ‘mins’ has to be rerun on col px up until this point (but not including record 4), but this time without, or ignoring, record 2 where ID=3 is registered. Hence (continuing from record 4)

record id acn  px RM

4        3   0    10   11

5        5   1    13   11

Again, record 6 indicates that ID=4 has acn=0 and therefore is cancelled and its price is no longer registered in the system. Yet again, “mins” has to be run up til this point (but not inclusive record 6), but this time ignoring not only records where ID=3, but also where ID=4.

record id acn  px RM

6        4   0    11   13

7        6   1    17   13

This is the result I am after: 

record id acn  px RM

0        1   1    15  15

1        2   1    20  15

2        3   1    10  10

3        4   1    11   10

4        3   0    10   11

5        5   1    13   11

6        4   0    11   13

7        6   1    17   13

Absolutely no idea how to implement this in q. I would definitely give me hard time in other programming languages that I am pretty familiar already too…

As always, if you have a minute or two your time and effort is very ‘extremely’ appreciated.

Regards, 

VA.

One solution is to write a function which maintains the state of active orders after each row. You can do this with the scan adverb.

Example using your data:

/ create the example table

q)show t:flipidacn`px!(1 2 3 4 3 5 4 6;1 1 1 1 0 1 0 1; 15 20 10 11 10 13 11 17)

id acn px

---------

1 1 15

2 1 20

3 1 10

4 1 11

3 0 10

5 1 13

4 0 11

6 1 17

/ function to build active price state

/ currState param is dictionary with current state

/ nextEntry param is a dictionary with the changes to apply to state (ie. a row of the above table)

/ for a given ‘nextEntry’, function will check acn. If 0, it will remove the id from the currState and return it. otherwise will add/update the entry with the given price (px)

/ note the scan adverb () at the end of the function - this is important

q)buildActivePricesState:{[currState;nextEntry] ?[0=nextEntryacn;enlist[nextEntryid] _ currState;currState,enlist[nextEntryid]!enlist nextEntry[px]] }</font>

/ Example output on your data

/ each line represents state of active prices after applying the matching row in the table ‘t’ (added some comments on RHS of output)

/ note, initial state is given as ()!() .. and scan operator will iterate over each row in ‘t’

q) buildActivePricesState[()!();t]

(,1)!,15

1 2!15 20

1 2 3!15 20 10

1 2 3 4!15 20 10 11

1 2 4!15 20 11 / removed id=3, as the acn is 0

1 2 4 5!15 20 11 13

1 2 5!15 20 13 / removed id= 4, as the acn is 0

1 2 5 6!15 20 13 17

/ Adding running minimum (RM) to your table

/ you can then use this function in a few different ways to get minimum active price -

q) update RM:min each buildActivePricesState[()!();t] from t

id acn px RM

------------

1 1 15 15

2 1 20 15

3 1 10 10

4 1 11 10

3 0 10 11

5 1 13 11

4 0 11 13

6 1 17 13


    / Another way of adding running minimum (RM) to your table

/ same as above, just anoter way to supply the ‘table’ argument

q) update RM:min each buildActivePricesState[()!();( id;acn;px)] from t

id acn px RM

------------

1 1 15 15

2 1 20 15

3 1 10 10

4 1 11 10

3 0 10 11

5 1 13 11

4 0 11 13

6 1 17 13

One additional point - you do not need to bind the scan adverb () to the function when it is defined - it can be defined as a normal function, and you can use the adverb form later (this is recommended approach, as the original func is more easily reused):

/ notice no trailing ''

q)buildActivePricesState:{[currState;nextEntry] ?[0=nextEntryacn;enlist[nextEntryid] _ currState;currState,enlist[nextEntryid]!enlist nextEntry[px]] }

  / the scan operator can be bound to the function when it is used -

q)buildActivePricesState[()!();t]

(,1)!,15

1 2!15 20

1 2 3!15 20 10

1 2 3 4!15 20 10 11

1 2 4!15 20 11

1 2 4 5!15 20 11 13

1 2 5!15 20 13

1 2 5 6!15 20 13 17

Regards,

Mohammad Noor

Hi Mohammad, 
Thank you very much for your help. This is work of art. I will need to some time to digest it to understand the syntax entirely. Thank you once again. 

Krishna, thank you too, but your code doesn’t seem to be doing what I am after. Thank you for your time as well!!! 

Hey,

Just another way


q)update rm:{[x;y;z;a] $[z;y&x;a]}[px[0];px;acn;prev px] from t

id acn px rm

------------

1 1 15 15

2 1 20 15

3 1 10 10

4 1 11 10

3 0 10 11

5 1 13 11

4 0 11 13

6 1 17 13


Thanks,
Krishan

mohammad’s idea to use scan was spot on.  i would only add that replacing inactive prices with infinity makes the logic a bit easier:

q)update  RM:min each @[()!();id;:;?[1=acn;px;0W]] from t
id acn px RM

1  1   15 15
2  1   20 15
3  1   10 10
4  1   11 10
3  0   10 11
5  1   13 11
4  0   11 13
6  1   17 13

Newbie question, any info highly appreciated - could someone explain both the scans, please? 

update rm:{[x;y;z;a] $[z;y&x;a]}[px[0];px;acn;prev px] from t - specifically, what is px[0] in each iteration?

and 

update  RM:min each @[()!();id;:;?[1=acn;px;0W]] from t - is this a case of functional amend or is @ being used as a indexer?

Just one sentence explaining the scans in plain English would do, 

thank you, 

jj. 

I will explain the second query and you should be able to figure out the first one.

So @[()!();id;:;?[1=acn;px;0W]]  is using the functional form of amend with scan, in the first iteration, empty dictionary is being passed as first parameter and it simply iterates over rest of the parameters using result of previous iteration as the first parameter (dictionary of id’s in this case), can be explained from below


q)select @[()!();id;:;?[acn=1;px;0w]] from t

x
-----------------------------
(,1)!,15f
1 2!15 20f
1 2 3!15 20 10f
1 2 3 4!15 20 10 11f
1 2 3 4!15 20 0w 11
1 2 3 4 5!15 20 0w 11 13
1 2 3 4 5!15 20 0w 0w 13
1 2 3 4 5 6!15 20 0w 0w 13 17

On Friday, February 24, 2017 at 3:04:06 PM UTC, Codelocker wrote:

Newbie question, any info highly appreciated - could someone explain both the scans, please? 

update rm:{[x;y;z;a] $[z;y&x;a]}[px[0];px;acn;prev px] from t - specifically, what is px[0] in each iteration?

and 

update  RM:min each @[()!();id;:;?[1=acn;px;0W]] from t - is this a case of functional amend or is @ being used as a indexer?

Just one sentence explaining the scans in plain English would do, 

thank you, 

jj. 

Thank you very much, Ajay. 

Hi,

I just joined this community,

thinking of another way

update RM:raze mins each (0,where acn=0) _ fills ?[acn=0;0n;px] from t

Not sure if the previous replies about replacing with inf works
example:

q)t

id acn px


1  1   1

2  1   20

3  1   10

4  1   11

3  0   10

5  1   13

4  0   11

6  1   17

q)update  RM:min each @[()!();id;:;?[1=acn;px;0W]] from t

id acn px RM


1  1   1  1

2  1   20 1

3  1   10 1

4  1   11 1

3  0   10 1

5  1   13 1

4  0   11 1

6  1   17 1

using my method

q)update RM:raze mins each (0,where acn=0) _ fills ?[acn=0;0n;px] from t

id acn px RM


1  1   1  1

2  1   20 1

3  1   10 1

4  1   11 1

3  0   10 11

5  1   13 11

4  0   11 13

6  1   17 13

Also, what should be the behaviour if there are consecutive acc=0?

q)t

id acn px


1  1   15

2  1   20

3  1   10

4  1   11

3  0   10

5  0   13

4  0   11

6  1   17

q)update  RM:min each @[()!();id;:;?[1=acn;px;0W]] from t

id acn px RM


1  1   15 15

2  1   20 15

3  1   10 10

4  1   11 10

3  0   10 11

5  0   13 11

4  0   11 15???

6  1   17 15

q)update RM:raze mins each (0,where acn=0) _ fills ?[acn=0;0n;px] from t

id acn px RM


1  1   15 15

2  1   20 15

3  1   10 10

4  1   11 10

3  0   10 11

5  0   13 11

4  0   11 11

6  1   17 11

Hi 

I don’t think any solution that doesn’t carry forward the full state will work.  The example only has one state change level (i.e. it only ever has to look back 1 price).  If there is > 1 change then they don’t work, e.g. change the px at id 1 to 12:

q) t:flipidacn`px!(1 2 3 4 3 5 4 6;1 1 1 1 0 1 0 1; 12 20 10 11 10 13 11 17)                                                                                                                                                                   

q)show t                                                                                                                                                                                                                                         

id acn px


1  1   12

2  1   20

3  1   10

4  1   11

3  0   10

5  1   13

4  0   11

6  1   17

q)buildActivePricesState[()!();t]                                                                                                                                                                                                                

(,1)!,12

1 2!12 20

1 2 3!12 20 10

1 2 3 4!12 20 10 11

1 2 4!12 20 11

1 2 4 5!12 20 11 13

1 2 5!12 20 13

1 2 5 6!12 20 13 17

// this is still correct

q)min each buildActivePricesState[()!();t]                                                                                                                                                                                                       

12 12 10 10 11 11 12 12

// this isn’t

q)update RM:raze mins each (0,where acn=0) _ fills ?[acn=0;0n;px] from t                                                                                                                                                                         

id acn px RM


1  1   12 12

2  1   20 12

3  1   10 10

4  1   11 10

3  0   10 11

5  1   13 11

4  0   11 13

6  1   17 13

I think there are only a couple of ways to handle these types of problem:

  1. track the full state (as Mohammad’s buildActivePriceState)

  2. iterate through the table and run consecutive queries over the previous n rows

These approaches are basically the same, except (1) will likely be higher memory usage (tracking the intermediate state), (2) will be higher compute time (as you are essentially building the “state to now” from fresh for every row).  There might be other approaches, one might be to use a version of over to iterate across the whole table until input=output, but I think that may be very expensive. 

If you are happy to use a global variable, and you really only care about the min value rather than the whole state at each time point, you can use a global variable to track the state to increase speed/reduce memory.  You can also do it with a local but the time performance isn’t good. 

// currState is global

q)buildActivePricesStateGlobalInner:{[nextEntry] $[0=nextEntryacn; @[.;currState;_;nextEntryid];currState,::enlist[nextEntryid]!enlist nextEntry[px]]; min currState}each                                                                  

// set currState to empty, build list

q)buildActivePricesStateGlobal                                                                                                                                                                                                                   

{currState::()!(); buildActivePricesStateGlobalInner x}

q)buildActivePricesStateGlobal[t]                                                                                                                                                                                                                

12 12 10 10 11 11 12 12

// timings

q)\ts buildActivePricesStateGlobal[100000#t]                                                                                                                                                                                                     

285 6318976

q)\ts min each buildActivePricesState[()!();100000#t]                                                                                                                                                                                            

364 26595136

q)(min each buildActivePricesState[()!();100000#t]) ~ buildActivePricesStateGlobal[100000#t]                                                                                                                                                     

1b

Thanks 

Jonny

Actually, can do a bit better performance wise if we shift things around a bit and use Nick’s speedup: 

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo}span.s1 {font-variant-ligatures: no-common-ligatures}

q)buildActivePricesStateGlobalInner

{[id;px] currState[id]:px; min currState}

q)buildActivePricesStateGlobal

{currState::()!(); x:update px:0W from x where acn=0; buildActivePricesStateGlobalInner’[xid;xpx]}

q)\ts buildActivePricesStateGlobal[100000#t]

106 6318912

q)\ts min each buildActivePricesState[()!();100000#t]

355 26595136

q)buildActivePricesStateGlobal[100000#t] ~ min each buildActivePricesState[()!();100000#t]

1b

On Monday, February 27, 2017 at 9:09:00 AM UTC, Jonny Press wrote:

Hi 

I don’t think any solution that doesn’t carry forward the full state will work.  The example only has one state change level (i.e. it only ever has to look back 1 price).  If there is > 1 change then they don’t work, e.g. change the px at id 1 to 12:

q) t:flipidacn`px!(1 2 3 4 3 5 4 6;1 1 1 1 0 1 0 1; 12 20 10 11 10 13 11 17)                                                                                                                                                                   

q)show t                                                                                                                                                                                                                                         

id acn px


1  1   12

2  1   20

3  1   10

4  1   11

3  0   10

5  1   13

4  0   11

6  1   17

q)buildActivePricesState[()!();t]

(,1)!,12

1 2!12 20

1 2 3!12 20 10

1 2 3 4!12 20 10 11

1 2 4!12 20 11

1 2 4 5!12 20 11 13

1 2 5!12 20 13

1 2 5 6!12 20 13 17

// this is still correct

q)min each buildActivePricesState[()!();t]

12 12 10 10 11 11 12 12

// this isn’t

q)update RM:raze mins each (0,where acn=0) _ fills ?[acn=0;0n;px] from t                                                                                                                                                                         

id acn px RM


1  1   12 12

2  1   20 12

3  1   10 10

4  1   11 10

3  0   10 11

5  1   13 11

4  0   11 13

6  1   17 13

I think there are only a couple of ways to handle these types of problem:

  1. track the full state (as Mohammad’s buildActivePriceState)

  2. iterate through the table and run consecutive queries over the previous n rows

These approaches are basically the same, except (1) will likely be higher memory usage (tracking the intermediate state), (2) will be higher compute time (as you are essentially building the “state to now” from fresh for every row).  There might be other approaches, one might be to use a version of over to iterate across the whole table until input=output, but I think that may be very expensive. 

If you are happy to use a global variable, and you really only care about the min value rather than the whole state at each time point, you can use a global variable to track the state to increase speed/reduce memory.  You can also do it with a local but the time performance isn’t good. 

// currState is global

q)buildActivePricesStateGlobalInner:{[nextEntry] $[0=nextEntryacn; @[.;currState;_;nextEntryid];currState,::enlist[nextEntryid]!enlist nextEntry[px]]; min currState}each

// set currState to empty, build list

q)buildActivePricesStateGlobal

{currState::()!(); buildActivePricesStateGlobalInner x}

q)buildActivePricesStateGlobal[t]

12 12 10 10 11 11 12 12

// timings

q)\ts buildActivePricesStateGlobal[100000#t]

285 6318976

q)\ts min each buildActivePricesState[()!();100000#t]

364 26595136

q)(min each buildActivePricesState[()!();100000#t]) ~ buildActivePricesStateGlobal[100000#t]

1b

Thanks 

Jonny

one more solution:

t:flipidacn`px!(1 2 3 4 3 5 4 6;1 1 1 1 0 1 0 1; 15 20 10 11 10 13 11 17)

k)update rmin:(min ‘+ {@[(#t)#0N;x[0]+!x 1;:;x 2]}’+. exec i,it:(-i-#i )^it-i,px from .q.lj[t]select it:*i by id from t where acn=0) from t

Thank you very much, this is a great improvement! But I am still running into performance problems. I have  a table that is count t = 9392779, and it is taking just too long time to compute. I do not mind waiting, but not hours and hours. Would anyone suggest a way around this? 

 

Performance depends a lot on the id column. How many distinct ids/how many rows on average for each id are there in your 9 million length table? 

Cheers 

Ryan 

It might depend on the activity in your real data. Can you try running the first version I sent and see if it is faster? The optimization with the infinity value may slow it down on large datasets as when there is a large number of unique orders the current state dictionary will be very large, so the min operation will be slow. So if instead you remove dead orders (first version) then it might be quicker. 

(I think the code for version 1 can be improved a bit too) 

Sent from my iPhone

As requested, 
In this table I have 

count idList = 4451102

avg idList count = 2.11 but there are some ids that appear more than that. For example: 

desc idList

id       | idCount

---------| -------

136804121| 286

136561734| 262

142266716| 191

137702903| 186

146275665| 186

149983273| 186

135935582| 183

150624717| 161

136623003| 158

149669428| 157

134259987| 152

145771313| 151

147540408| 151

146652097| 147

139549185| 139

138536519| 136

133120715| 134

134270994| 134

138845988| 128

147137301| 127

151356267| 127

134362547| 123

136693027| 123

142536322| 122

136051457| 121

137050076| 120

136997343| 118

141489703| 117

143748919| 110

151214633| 110

133080972| 109

151515926| 108

136288633| 106

135735451| 105

148774185| 105

145408522| 104

146586761| 104

..

Could you be a little more specific please. There have been a lot of replies. Please let me know what you mean by the first version? 

Ok sure.  Try this one:

q)f                                                                                                                                                                             

{currState::()!(); {[iszero;id;px] $[iszero; @[.;currState;_;id];currState[id]:px];min currState}'[0=x`acn;x`id;x`px]}

q)t                                                                                                                                                                             

id acn px


1  1   12

2  1   20

3  1   10

4  1   11

3  0   10

5  1   13

4  0   11

6  1   17

q)\ts f[100000#t]                                                                                                                                                               

151 6449952

I think this version will be better given the number of unique orders you have 

improved version:

t:flipidacn`px!(1 2 3 4 3 5 4 6;1 1 1 1 0 1 0 1; 15 20 10 11 10 13 11 17)

s:()!(); t,'(RM:{$[1=yacn;[s[yid]:ypx;min x,ypx];[s _:y`id;min value s]]}[0N;t])

fast enough?