Key-dependent aggregation

lefevrol · November 30, 2009, 5:52pm

Say you have these two keyed tables: t: ([s:(,/)3#'abc;d:(,/)3#enlist 1 2 3]v:til 9) r: ([s:abc]c:.25 .5 .75)and that you want to sum the v’s grouped by s but weightedby the appropriate coefficient c from r, i.e., something like select {[c;v] c*sum v} by s from t"(yes, in this example I could multiply after the fact but thereal-world problem is more complex and I need c upfront).I have a couple of solutions that involve joining t and r orderivatives thereof first but they look wasteful, data-wise,and worse the code looks like something I won’t understandanymore in 2 weeks, if not sooner. This, on other hand, isnot too bad: select {r[x;`e]*sum y}.(first s;v) by s from tbut, again, the real problem is more complex and I’d like tosimplify even more. Is there a really elegant piece of syntaxI have missed?Thanks

f11 · November 30, 2009, 6:55pm

Hello,Perhaps a foreign-key chase might do the trick if I understand youcorrectly:select sum v*(r(s:s))[c] by s from tRegards,Fintan.On Nov 30, 12:52?pm, olivier <lefev...> wrote:> Say you have these two keyed tables:>> ? t: ([s:(,/)3#'abc;d:(,/)3#enlist 1 2 3]v:til 9)> ? r: ([s:abc]c:.25 .5 .75)>> and that you want to sum the v's grouped by s but weighted> by the appropriate coefficient c from r, i.e., something like>> ? select {[c;v] c*sum v} <what to put here> by s from t">> (yes, in this example I could multiply after the fact but the> real-world problem is more complex and I need c upfront).> I have a couple of solutions that involve joining t and r or> derivatives thereof first but they look wasteful, data-wise,> and worse the code looks like something I won't understand> anymore in 2 weeks, if not sooner. This, on other hand, is> not too bad:>> ? select {r[x;e]*sum y}.(first s;v) by s from t>> but, again, the real problem is more complex and I’d like to> simplify even more. Is there a really elegant piece of syntax> I have missed?>> Thanks</lefev…>

Attila · November 30, 2009, 7:02pm

Fintan is right, you are looking for this> select sum v*(r(s:s))[c] by s from tit is usually written as this:select sum v*r[([]s);c]by s from tbecause of not having the projection it is a bit fasterq)\t do[10000;select sum v*r[(s)]c by s from t]109q)\t do[10000;select sum v*r[([]s);c] by s from t]95And btw the join is not that bad itselfq)\t do[10000;select sum v*c by s from t lj r]160Regards, AttilaOn 30 Nov 2009, at 18:55, fintanq wrote:> Hello,> > Perhaps a foreign-key chase might do the trick if I understand you> correctly:> > select sum v*(r(s:s))[c] by s from t\> \> Regards,\> \> Fintan.\> \> On Nov 30, 12:52 pm, olivier <lefev...> wrote:>> Say you have these two keyed tables:>> >> t: ([s:(,/)3#'abc;d:(,/)3#enlist 1 2 3]v:til 9)>> r: ([s:abc]c:.25 .5 .75)>> >> and that you want to sum the v's grouped by s but weighted>> by the appropriate coefficient c from r, i.e., something like>> >> select {[c;v] c*sum v} <what to put here> by s from t">> >> (yes, in this example I could multiply after the fact but the>> real-world problem is more complex and I need c upfront).>> I have a couple of solutions that involve joining t and r or>> derivatives thereof first but they look wasteful, data-wise,>> and worse the code looks like something I won't understand>> anymore in 2 weeks, if not sooner. This, on other hand, is>> not too bad:>> >> select {r[x;e]*sum y}.(first s;v) by s from t>> >> but, again, the real problem is more complex and I’d like to>> simplify even more. Is there a really elegant piece of syntax>> I have missed?>> >> Thanks> > –> >

Submitted via Google Groups</lefev…>

lefevrol · November 30, 2009, 8:27pm

Thanks to both of you but, as I said, in the real problem theexpressionis much complicated and thus I really need to pass c as an argumenttothe aggregating function. Like my {[c;v] c*sum v} dummy the real-worldverb takes a scalar argument and a vector one but it is much toocomplicated to be inlined as in your solution.About the join: it may be fast but it must cause terrible memory blow-up:in the real problem there are many v columns in t, each one with amatchinge column in r, and there are thousands of d’s for each s value.

lefevrol · November 30, 2009, 8:35pm

Also, in your solution it seems to me that you aremultiplying every v by its c instead of the whole sum,i.e., you are using a vector c, albeit a constant one.In this example this works because (sum c*v) (c*sum v) but in the real problem c really functionsas a scalar and the aggregating function will bombif passed a vector of c’s.

Attila · November 30, 2009, 8:40pm

sure, you can have 1#sOn 30 Nov 2009, at 20:35, olivier wrote:> Also, in your solution it seems to me that you are> multiplying every v by its c instead of the whole sum,> i.e., you are using a vector c, albeit a constant one.> In this example this works because (sum c*v) => (c*sum v) but in the real problem c really functions> as a scalar and the aggregating function will bomb> if passed a vector of c’s.> > –> > Submitted via Google Groups

lefevrol · November 30, 2009, 8:48pm

OK, so this works, too: select {x*sum y}.(first r[(s);c];v) by s from tbut is functionally equivalent to my first posted solution and Ifind it harder to read: the r[([]s);c] syntax is really quite magicalif you haven’t seen it before.

Topic		Views
Error with sum Community Support kdb-and-q	2	July 31, 2022
pass key and value of "by" to function Community Support kdb-and-q	1	October 28, 2013
Named column equi-joins Community Support kdb-and-q	2	July 28, 2015
WindowJoin Community Support kdb-and-q	1	February 8, 2021
Right Crosstab Anyone? Community Support kdb-and-q	11	March 5, 2019

Key-dependent aggregation

Related topics