Say I have a table with three boolean flags (a,b,c) and a numericalvalue (d). I want to calculate the average value of d for allpossible filtrations of a,b,c.For example:a, b, c (all true)a, b, c in 01b (a and b true, don’t filter on c)a in 01b, b in 01b, c in 01b (no filters)I’ve come up with a working solution, but it seems like there should bea more elegant solution. Any ideas?Thanks,Leonq)t:( a:100?01b;b:100?01b;c:100?01b;d:100?1f)q)ta b c d ----------------1 0 0 0.9064903 0 0 1 0.060302871 0 0 0.8431435 1 1 1 0.06367737..q)f:enlist each ((),0b;(),1b;01b)q)raze {exec enlist a
bc
d!(x,avg d) from t where a in x 0,b in x 1,c in x 2} each raze f ,/:: raze f ,/:: fa b c d ---------------------,0b ,0b ,0b 0.5507895,0b ,0b ,1b 0.4045915,0b ,0b 01b 0.4862904,0b ,1b ,0b 0.4302817,0b ,1b ,1b 0.650895 ,0b ,1b 01b 0.5353356,0b 01b ,0b 0.5066033,0b 01b ,1b 0.5031129,0b 01b 01b 0.5050167,1b ,0b ,0b 0.5314342,1b ,0b ,1b 0.385823 ,1b ,0b 01b 0.4620955,1b ,1b ,0b 0.5273166,1b ,1b ,1b 0.5052581,1b ,1b 01b 0.5162874,1b 01b ,0b 0.5292859,1b 01b ,1b 0.4509694,1b 01b 01b 0.490997801b ,0b ,0b 0.543692601b ,0b ,1b 0.3970841..
{[x;y;z]exec a:x,b:y,c:z,avg d from t where a in x,b in y,c in z}.’ f cross f cross f
should do the trick
Cheers,
Sean
To: personal-kdbplus@googlegroups.comX-Mailer: Apple Mail (2.1878.6)You could use a functional form of select avg d by a,b,c from tand pass in the columns that you want to group on Firstly set the filters to be shorts (as that allows null values)update short$a,
short$b, short$c from
tThen do:q)(uj/){0!eval(?;x;();y!y,:();(enlistd)!enlist(avg;
d))}[t] each (a
bc;
ab;
ac;
a;b;
bc;
c) a b c d ---------------0 0 0 0.42974190 0 1 0.57539930 1 0 0.48701920 1 1 0.57239091 0 0 0.44328621 0 1 0.47975221 1 0 0.50520671 1 1 0.50109270 0 0.50257060 1 0.52528931 0 0.45433651 1 0.50376680 0 0.46639940 1 0.57362161 0 0.46564641 1 0.48853950 0.51658851 0.4729894..JonnyAquaQ AnalyticsOn 21 Jul 2014, at 16:49, Leon Baum wrote:> > Say I have a table with three boolean flags (a,b,c) and a numerical> value (d). I want to calculate the average value of d for all> possible filtrations of a,b,c.> > For example:> a=1b, b=1b, c=1b (all true)> a=1b, b=1b, c in 01b (a and b true, don’t filter on c)> a in 01b, b in 01b, c in 01b (no filters)> > I’ve come up with a working solution, but it seems like there should be> a more elegant solution. Any ideas?> > Thanks,> Leon> > q)t:( a:100?01b;b:100?01b;c:100?01b;d:100?1f)> q)t> a b c d > ----------------> 1 0 0 0.9064903 > 0 0 1 0.06030287> 1 0 0 0.8431435 > 1 1 1 0.06367737> ..> q)f:enlist each ((),0b;(),1b;01b)> q)raze {exec enlist a
bc
d!(x,avg d) from t where a in x 0,b in x 1,c in x 2} each raze f ,/:: raze f ,/:: f> a b c d > ---------------------> ,0b ,0b ,0b 0.5507895> ,0b ,0b ,1b 0.4045915> ,0b ,0b 01b 0.4862904> ,0b ,1b ,0b 0.4302817> ,0b ,1b ,1b 0.650895 > ,0b ,1b 01b 0.5353356> ,0b 01b ,0b 0.5066033> ,0b 01b ,1b 0.5031129> ,0b 01b 01b 0.5050167> ,1b ,0b ,0b 0.5314342> ,1b ,0b ,1b 0.385823 > ,1b ,0b 01b 0.4620955> ,1b ,1b ,0b 0.5273166> ,1b ,1b ,1b 0.5052581> ,1b ,1b 01b 0.5162874> ,1b 01b ,0b 0.5292859> ,1b 01b ,1b 0.4509694> ,1b 01b 01b 0.4909978> 01b ,0b ,0b 0.5436926> 01b ,0b ,1b 0.3970841> ..> > – >
Submitted via Google Groups
Following on from Jonny’s solution, you can also use aggregation to speed up the calculation over large data sets.
Assuming you have converted your boolean’s to shorts:
g:{0!eval(?;x;();y!y,:();x
d!((sum;x);(sum;
d)))}
select a,b,c,d%x from ((uj/)g[aggr] each (a
b;a
c;b
c;a;
b;`c)) uj aggr:0!select count i, sum d by a,b,c from t
This gives me a calculation time 2-4 times lower than Jonny’s solution.
Steven Hutchinson,
AquaQ Analytics
I have a few points I think are worthy noting:
You have to consider the readability in each…whichever way you like yourself I suppose.
Your solution does not consider the 27th case - no filters applied, i.e. avg d of the whole tab.
Steven your solution is indeed nice and fast, but comes at the cost of double the memory.
Lets take an example of 1m records.
q)n:1000000
q)t:(a:n?01b;b:n?01b;c:n?01b;d:n?1f)
q)f:enlist each ((),0b;(),1b;01b)
q)\ts o:raze {exec enlist a
bc
d!(x,avg d) from t where a in x 0,b in x 1,c in x 2} each raze f ,/:: raze f ,/:: f
1063 25174000
q)\ts m:{[x;y;z]exec a:x,b:y,c:z,avg d from t where a in x,b in y,c in z}.’ f cross f cross f
919 25170160
q)\ts jt:update short$a,
short$b, `short$c from t
7 6292224
q)\ts j:(uj/){0!eval(?;x;();y!y,:();(enlistd)!enlist(avg;
d))}[jt] each (a
bc;
ab;
ac;
a;b;
bc;
c)
1511 50333744
q)g:{0!eval(?;x;();y!y,:();x
d!((sum;x);(sum;
d)))}
q)\ts s:select a,b,c,d%x from ((uj/)g[aggr] each (a
b;a
c;b
c;a;
b;`c)) uj aggr:0!select count i, sum d by a,b,c from jt
380 50333776
q)o~m
1b
Thanks,
Sean
My mistake on the lack of the all null option, I should have added an empty list () to account for that case.
There is a memory penalty doing things this way,
however I thought it a useful alternative given the increased speed.
A possibly tidier version with all 27 conditions:
g:{0!eval(?;x;();y!y,:();x
d!((sum;x);(sum;
d)))}
select a,b,c,avgvals:d%x from {y uj g[x;z]}[aggr]/[aggr:0!update “h”$a,“h”$b,“h”$c from select count i, sum d by a,b,c from t;(`a`b;`a`c;`b`c;`a;`b;`c;())]
In this case I’ve also moved the conversion to short to after the aggregation for further speed improvement. For a table of 1000000 elements as measured by \ts:
My earlier method: 300 50332336
The newer version: 90 50332336
Also interesting to note how this method scales computationally for a larger number of conditions. For example, with 6 conditions:
q)n:1000000
q)t:(a:n?01b;b:n?01b;c:n?01b;d:n?01b;e:n?01b;f:n?01b;q:n?1f)
q)g6:{0!eval(?;x;();y!y,:();x
q!((sum;x);(sum;
q)))}
q)patterns:-1 _ except[;`] each `a`b`c`d`e`f(cross/) -1,/:til 6
q)\ts select a,b,c,d,e,f,avgval:q%x from {y uj g6[x;z]}[aggr]/[aggr:0!update “h”$a,“h”$b,“h”$c,“h”$d,“h”$e,“h”$f from select count i, sum q by a,b,c,d,e,f from t;patterns]
158 75500368
q)\ts {[u;v;w;x;y;z]exec a:u,b:v,c:w,d:x,e:y,f:z,avg q from t where a in u,b in v,c in w,d in x, e in y,f in z}.’ f cross f cross f cross f cross f cross f
24784 25303936
nice stuff
couple of quick notes
q)raze f ,/:: raze f ,/:: f
could be
q)2(f cross)/f
doing ? by x,y,z with x,y,z being booleans is way faster as Steven just noted himself as well
q)\ts select count i,sum d by a,b,c from jt
146 50332816
q)\ts select count i,sum d by a,b,c from t
18 50332816
so lets do the conversion in g instead
also could do the division right in there too
and no need for eval
q)g:{0!?[x;();y!(short$;)each y;(enlist
d)!enlist(%;(sum;d);(sum;
x))]}
another slightly faster way to generate the power set
q)p:{where each neg#'0b vs/:til prd x#2}
q)patterns~-1_a
bc
de
f p 6
1b
need the 1_ as you don?t include select all as an option
but including it actually simplifies the code
(the extra calculation is neglible, simple is almost always better)
and we can do the first aggregation with a functional select too
q)\ts a:(uj/)g[?[t;();b!b;x
d!((count;i);(sum;
d))]] each b reverse p count b:a
b`c
18 50334368
Cheers,
Attila
a different way
t: (a:1000000?0b;b:1000000?0b;c:1000000?0b;d:1000000?1f)
m:0!select sum[d],cnt:count[d] by a,b,c from t;
f:enlist[1b;0b;enlist[01b]];
a:p!(sum each m[d] ind) % sum each m[
cnt] ind:where each flip[m[`a`b`c]] (111b~in’):/: p:f cross f cross f;