Custom cumulative functions( like sums, avgs, prds etc.) in q

Hello people, I have come across a problem where I have to calculate percentiles of a given set of numbers cumulatively. I need to arrange the given set of numbers in a table chronologically and then calculate percentiles incrementally for each row. Example below: Date Score 50th_percentile 2018.03.28 64 64 2018.04.30 49 56.5 2018.05.31 82 64 2018.06.30 40 56.5 2018.07.31 88 64 2018.08.31 77 70.5 2018.09.30 30 64 2018.10.31 17 56.5 2018.11.30 23 49 2018.12.31 12 44.5 As you can see, the table is arranged ascending by date. I have defined a custom function to calculate nth percentile from a list of numbers. I need to call this function cumulatively as I pass over each row from the table as shown above. This is exactly the way functions like avgs, sums, prds work, except here, I want to define my own function and call it cumulatively. I haven’t found a way to achieve this. It would really helpful if anyone knows how to do it. Cheers! Regards, Vivek Shende

just type avgs, sums, prds, etc. in console, and you’ll see their definitions.

Hi Vivek

For your specific example something like the following would be needed:

q)a:([]a:1 2 3 4 5 6 7 8;b:64 49 82 40 88 77 30 17)q)f:{[x;y] (med l; l:x[1],y) }q)update pc:f\[();b][;0] from aa b pc---------1 64 642 49 56.53 82 644 40 56.55 88 646 77 70.57 30 648 17 56.5e

The function f is a slightly extended form of the prd and sums function in that it maintains state between each iteration. This can be seen clearly by applying the function to the actual row:

q)f\[();64 49 82 40 88 77]64f ,6456.5 64 4964f 64 49 8256.5 64 49 82 4064f 64 49 82 40 8870.5 64 49 82 40 88 77

Here the median is being passed each time and the cumulative row growing. The issue with this approach is that the memory usage will grow quickly. The method can be improved by interchanging \ and /

q)n10000q)t:([]a:n?10;b:n?10000)q)a1:update pc:f\[();b][;0] from tq)a2:update pc:g/[();b][1] from tq)a2~a11bq)\ts update pc:f\[();b][;0] from t887 595962688q)\ts update pc:g/[();b][1] from t880 656192

To iterate over your list you would need either over adverb or recursion.  Although your requirement looks similar to sums or prds but its not exactly same. Difference is sums and prds only require result from last iteration in the next iteration which makes them relatively easy to implement with ‘/’  whereas in your case you would need to maintain the input list state as well. 

For example to calculate cumulative 50th percentile,  one solution is to simply pass list state in each iteration and use ‘med’ function as mentioned in other answer.

Or you could define your own function to calculate percentile.  Below is one example which runs faster than med approach.

    q) ins: {i:1+x bin y;#[i;x],y,_[i;x]}  / insertion sort

  q)cperc50:{first {(x[0],avg s@z;s:is[x 1;y])}/[(();());x;floor 0.5*0 1+/:til count x]} / 50 percentile


q) l: 64 49 82 40 88 77 30 17 23 12

q) cperc50 l

q)64 56.5 64 56.5 64 70.5 64 56.5 49 44.5


q)n:10000

q)t:(a:n?10;b:n?10000)

 q) \ts update pc:cperc50 b from t

q) 94 992592

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}span.s1 {font-variant-ligatures: no-common-ligatures} p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}span.s1 {font-variant-ligatures: no-common-ligatures} p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}span.s1 {font-variant-ligatures: no-common-ligatures} p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}span.s1 {font-variant-ligatures: no-common-ligatures} p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}span.s1 {font-variant-ligatures: no-common-ligatures} p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}span.s1 {font-variant-ligatures: no-common-ligatures}

Thanks for reply!

I tried doing that, but it didn’t help much. I couldn’t figure out from the syntax and deduce how to apply it for my case.

Hello Jamie,

Thank you very much! The solution worked for me. Being new to Kdb/q, I have been scratching my head around this problem since last week. You saved a lot of time for me.

However, I didn’t fully understand the solution, especially the snippets below:

f:{[x;y] (med l; l:x[1],y)}f\[();b][;0]

Could you please elaborate more in how the function definition works in a way that enables it to maintain state across iterations? Also while calling the function, why arguments are you passing empty list and 0?