Hi all,
Just wondering whether it is possible to sort rows by columns?
q)a:( num:6 2 3 1 4 5;tt:x
xx
yy
y;pp:1.1 2.2 3.3 4.4 5.5 6.6)
q)a
num tt pp
6 x 1.1
2 x 2.2
3 x 3.3
1 y 4.4
4 y 5.5
5 y 6.6
I want to sort the num column by tt, so the desire result is this:
num tt pp
6 x 1.1
3 x 3.3
2 x 2.2
5 y 6.6
4 y 5.5
1 y 4.4
I can only think of these solutions:
q)(num xdesc select from a where tt=
x),num xdesc select from a where tt=
y
q)(select[>num] from a where tt=x),select[\>num] from a where tt=
y
Any more efficient ways to do this?
Attila
May 23, 2013, 10:09am
2
Hi,
kdb+'s sort is stable by design, so
tt xasc
num xdesc a
Cheers,
? Attila
? Attila
Awesome, just what I needed (nice and efficient!)
What if I want to put conditions on sorting, say if tt=x then sort desc, if tt=
y then sort asc:
q)(select[>num] from a where tt=x),select[\<num] from a where tt=
y
num tt pp
6 x 1.1
3 x 3.3
2 x 2.2
1 y 4.4
4 y 5.5
5 y 6.6
other options
q)update neg num from tt
num xasc update neg num from a
q){x i iasc x[y]i:idesc x z}[a;tt;
num]
tt xasc
num xdesc a
seems a little wasteful sorting the whole table twice
and modified for your latest question
q)update abs num from tt
num xasc update neg num from a where tt=`x
thanks, sure these are very interesting
however i would argue that
elegance, conceptual simplicity and generality matter too
and they matter more (especially as the first cut) than pure performance
and that is not so easy to predict to begin with
q)a:10000000#a
i can explain xasc/xdesc easily to anyone
works with no matter of column-types
and one could even have multiple columns in all combinations
(or even more than a pair of relations but that is probably rare)
q)\ts tt xasc
num xdesc a
515 805307360
this trick only works with numbers, but also not that much faster on this toy example
q)\ts update neg num from tt
num xasc update neg num from a
490 603981232
this works with all kind of columns and faster too, but again harder to explain
and of course only works with two columns (could be modified easily with # though)
q)\ts {x i iasc x[y]i:idesc x z}[a;tt;
num]
425 603980416
of course performance penalty for the general solution with a real table with more columns would be more pronounced
Cheers,
Attila
Thanks Charles for you solution. I was needing something similar for n number of columns
Please let me know if following implementation is correct or not. Or I can improve the performance more.
Function:
;GenericSort:{[tbl;colNames;colOrders]
;len:-1 + count colNames
;indexes:til count tbl
;while[len>=0;$[`ds=colOrders len;indexes:indexes idesc tbl[colNames len] indexes;indexes:indexes iasc tbl[colNames len] indexes];len-:1]
;tbl indexes
}
Examples
;a:( num:1 2 1 2 1 2;tt:x
xx
yy
y;pp:1 1 2 4 4 5)
;colNames:num
tt`pp
;colOrders:as
ds`as
;GenericSort[a;colNames;colOrders]
;colNames:num
tt`pp
;colOrders:as
ds`ds
;GenericSort[a;colNames;colOrders]
For simplicity and readability, just use the pattern that Attilla advised, i.e.
tt xasc
num xdesc a
but for fun here’s another way
gsort:{x value{(::;i value[y]i:value x)}/[flip((),y;x(),z)]}
q)gsort[a;(iasc;idesc;iasc);`pp`tt`num]
num tt pp
1 y 4
1 x 1
1 x 2
2 y 4
2 y 5
2 x 1
oops, don’t want value y.
gsort:{x value{(::;i y[0]y[1]i:value x)}/[flip((),y;x(),z)]}
a tip - if you find you’re reaching out for a while loop, it’s probably a mistake. It’s more idiomatic in kdb+ to use adverbs, but if for whatever reason that’s unsuitable, then try to rework as a do loop - so it is certain to complete. e.g. your fn reworked as
q)GenericSort:{[t;colNames;colOrders]n:count colNames;i:til count t;do[m:n;m-:1;i:i((iasc;idesc)`ds=colOrders m)t[colNames m]i];t i}
adverbs remove a lot of the ‘-1’ errors in loops.
Thanks Charles for your suggestion. I will use it for next code changes.
I tried the solution provided by you. And i am gaining performance benefit. Though still i have to understand what is happening in that code :)
I cannot use the solution provided by Attila because even for less milliseconds makes a difference in overall performance.
Attila
May 15, 2014, 9:06am
11
{x{x y z x}/[::;(),y;x(),z]}
is a bit even faster
as to explanation you can read Stevan’s article
http://archive.vector.org.uk/art10500340
(point 6, look for msort)
but for reasonably long and not too wide tables
the naive approach is just fine
Attila
Thanks Attila for the link. It is helpful.