Sort row by column

Hi all,

Just wondering whether it is possible to sort rows by columns?

q)a:(num:6 2 3 1 4 5;tt:xxxyyy;pp:1.1 2.2 3.3 4.4 5.5 6.6)

q)a

num tt pp


6   x  1.1

2   x  2.2

3   x  3.3

1   y  4.4

4   y  5.5

5   y  6.6

I want to sort the num column by tt, so the desire result is this:

num tt pp


6   x  1.1

3   x  3.3

2   x  2.2

5   y  6.6

4   y  5.5

1   y  4.4

I can only think of these solutions:

q)(num xdesc select from a where tt=x),num xdesc select from a where tt=y

q)(select[>num] from a where tt=x),select[\>num] from a where tt=y

Any more efficient ways to do this?

Hi,

kdb+'s sort is stable by design, so

tt xascnum xdesc a

Cheers,

? Attila

? Attila

Awesome, just what I needed (nice and efficient!)

What if I want to put conditions on sorting, say if tt=x then sort desc, if tt=y then sort asc:

q)(select[>num] from a where tt=x),select[\<num] from a where tt=y

num tt pp


6   x  1.1

3   x  3.3

2   x  2.2

1   y  4.4

4   y  5.5

5   y  6.6

other options

q)update neg num from ttnum xasc update neg num from a
q){x i iasc x[y]i:idesc x z}[a;tt;num]

tt xascnum xdesc a

seems a little wasteful sorting the whole table twice

and modified for your latest question

q)update abs num from ttnum xasc update neg num from a where tt=`x

thanks, sure these are very interesting

however i would argue that

elegance, conceptual simplicity and generality matter too

and they matter more (especially as the first cut) than pure performance

and that is not so easy to predict to begin with

q)a:10000000#a

i can explain xasc/xdesc easily to anyone

works with no matter of column-types

and one could even have multiple columns in all combinations

(or even more than a pair of relations but that is probably rare)

q)\ts tt xascnum xdesc a

515 805307360

this trick only works with numbers, but also not that much faster on this toy example

q)\ts update neg num from ttnum xasc update neg num from a

490 603981232

this works with all kind of columns and faster too, but again harder to explain

and of course only works with two columns (could be modified easily with # though)

q)\ts {x i iasc x[y]i:idesc x z}[a;tt;num]

425 603980416

of course performance penalty for the general solution with a real table with more columns would be more pronounced

Cheers,

  Attila

Thanks Charles for you solution. I was needing something similar for n number of columns

Please let me know if following implementation is correct or not. Or I can improve the performance more.

Function:

;GenericSort:{[tbl;colNames;colOrders]

    ;len:-1 + count colNames

    ;indexes:til count tbl

    ;while[len>=0;$[`ds=colOrders len;indexes:indexes idesc tbl[colNames len] indexes;indexes:indexes iasc tbl[colNames len] indexes];len-:1]

    ;tbl indexes

}

Examples

;a:(num:1 2 1 2 1 2;tt:xxxyyy;pp:1 1 2 4 4 5)

;colNames:numtt`pp

;colOrders:asds`as

;GenericSort[a;colNames;colOrders]

;colNames:numtt`pp

;colOrders:asds`ds

;GenericSort[a;colNames;colOrders]

For simplicity and readability, just use the pattern that Attilla advised, i.e.

tt xascnum xdesc a

but for fun here’s another way

gsort:{x value{(::;i value[y]i:value x)}/[flip((),y;x(),z)]}

q)gsort[a;(iasc;idesc;iasc);`pp`tt`num]

num tt pp


1   y  4 

1   x  1 

1   x  2 

2   y  4 

2   y  5 

2   x  1 

oops, don’t want value y.

gsort:{x value{(::;i y[0]y[1]i:value x)}/[flip((),y;x(),z)]}

a tip - if you find you’re reaching out for a while loop, it’s probably a mistake. It’s more idiomatic in kdb+ to use adverbs, but if for whatever reason that’s unsuitable, then try to rework as a do loop - so it is certain to complete. e.g. your fn reworked as

q)GenericSort:{[t;colNames;colOrders]n:count colNames;i:til count t;do[m:n;m-:1;i:i((iasc;idesc)`ds=colOrders m)t[colNames m]i];t i}

adverbs remove a lot of the ‘-1’ errors in loops.

Thanks Charles for your suggestion. I will use it for next code changes. 
I tried the solution provided by you. And i am gaining performance benefit. Though still i have to understand what is happening in that code :)

I cannot use the solution provided by Attila because even for less milliseconds makes a difference in overall performance.

{x{x y z x}/[::;(),y;x(),z]}

is a bit even faster

as to explanation you can read Stevan’s article

http://archive.vector.org.uk/art10500340

(point 6, look for msort)

but for reasonably long and not too wide tables

the naive approach is just fine

  Attila

Thanks Attila for the link. It is helpful.