Cloning all rows of a table for each set of selected columns from

Hello,

I want to clone all rows from a Kdb table for each set of selected columns from the same table. I couldn’t find the best way to describe what I want to achieve. The example below should clarify what I want to do:

q)tab:([]a:JonSamwellDavosMelisandre;x1:11 12 13 14;x2:21 22 23 24;x3:31 32 33 34)q)taba x1 x2 x3-----------------------Jon 11 21 31Samwell 12 22 32Davos 13 23 33Melisandre 14 24 34q)tab2: (select a,x:x1 from tab),(select a,x:x2 from tab),(select a,x:x3 from tab)q)tab2a x---------------Jon 11Samwell 12Davos 13Melisandre 14Jon 21Samwell 22Davos 23Melisandre 24Jon 31Samwell 32Davos 33Melisandre 34

I need a shortcut function to convert tab to tab2. I have used joins to achieve this; but this solution is not scalable as I might get more columns x4, x5 etc.that I would I need to add to ‘x’ and querying table again and again in each join clause will also affect performance.

The operation you are describing sounds like unpivoting a table, the following function will achieve that for your example. To make it more generic you will need to make any columns that need to be ungrouped the values in your keyed table.

q)ungroup{enlist[`x]!enlist value x}each 1!tab

a x

-------------

Jon 11

Jon 21

Jon 31

Samwell 12

Samwell 22

Samwell 32

Davos 13

Davos 23

Davos 33

Melisandre 14

Melisandre 24

Melisandre 34

Hi Vivek,

One way you could go about doing it that should be reasonably more scalable is through the use of a functional select. For the sake of clarity I’ve increased both the number of rows and  columns in order to show the difference in performance between the methods. There may be faster ways than this to achieve what you’re looking for but this is a cleaner/more scalable version of the code to begin with,

q)N:100000

q)5#tab:(a:N?`1;x1:N?5;x2:N?10;x3:N?5;x4:N?10;x5:N?10;x6:N?5;x7:N?10)

a x1 x2 x3 x4 x5 x6 x7

-----------------------------

l 0 0 2 7 1 0 4

l 2 4 3 6 2 1 5

d 3 7 2 2 1 3 1

b 2 4 0 5 1 3 9

o 4 6 2 3 2 2 5

q)\t t1:raze{?[x;();0b;(y;x)!(y;z)]}[tab;a;]each x1x2x3x4x5x6`x7

101

q)\t t2:(select a,x:x1 from tab),(select a,x:x2 from tab),(select a,x:x3 from tab),(select a,x:x4 from tab),(select a,x:x5 from tab),(select a,x:x6 from tab),(select a,x:x7 from tab)

279

q)t1~t2

1b


The documentation for functional select statements in q can be found at https://code.kx.com/v2/basics/funsql/


I should note here that Thomas’ answer will also work (all required rows will be in the table) but it’s formatted differently and less performant on larger datasets. i.e for the same example


q)\t ungroup{enlist[`x]!enlist value x}each 1!tab

2795

I’d be interested to see any faster implementations. Hope this helps somewhat.

Kind regards,

Conor

ungroup - looks good , but as an additional solution you can you this (if you need to save an order)

a:raze {tab `a} each 1_cols tab;

x: raze {tab x} each 1_cols tab;

(a;x)

??, 15 ???. 2019 ?. ? 01:00, <thomas.smyth@aquaq.co.uk>:

The operation you are describing sounds like unpivoting a table, the following function will achieve that for your example. To make it more generic you will need to make any columns that need to be ungrouped the values in your keyed table.

q)ungroup{enlist[`x]!enlist value x}each 1!tab

a x

-------------

Jon 11

Jon 21

Jon 31

Samwell 12

Samwell 22

Samwell 32

Davos 13

Davos 23

Davos 33

Melisandre 14

Melisandre 24

Melisandre 34

Thank you Thomas! It worked for me. It is an uphill task to get your way around in Kdb/q if you are from a procedural/OOP programming background!

One easier-to-understand way for achieving this (without all the cryptic functional forms) is to view table columns as lists themselves:

q)ungroup select a,x:flip(x1;x2;x3)from taba x-------------Jon 11Jon 21Jon 31Samwell 12Samwell 22Samwell 32Davos 13Davos 23Davos 33Melisandre 14Melisandre 24Melisandre 34

Thanks Igor for the alternate approach!

Hello Conor,

This solution seems more performant! I am have modified Thomas’s solution to have an indicator of the column a particular row comes from, like below:


q)tab:(a:JonSamwellDavosMelisandre;x1:11 12 13 14;x2:21 22 23 24;x3:31 32 33 34)
q
)tab
a x1 x2 x3
-----------------------
Jon 11 21 31
Samwell 12 22 32
Davos 13 23 33
Melisandre 14 24 34

q
)tab2:ungroup {xidx!(value x;til 3)}each 1!tab
q
)tab2
a x idx
---------------------

Jon 11 0
Jon 21 1
Jon 31 2
Samwell 12 0
Samwell 22 1
Samwell 32 2
Davos 13 0
Davos 23 1
Davos 33 2
Melisandre 14 0
Melisandre 24 1
Melisandre 34 2


It basically tells me that (11 12 13 14) came from a different column than (21 22 23 24). Is this something I can achieve by modifying your solution?

Thanks!

This indeed is easier to understand! 

Similar to what I replied to Conor, can I modify this solution to add some sort of indicator of the column a row corresponds to?  like below:

q)tab:([]a:JonSamwellDavosMelisandre;x1:11 12 13 14;x2:21 22 23 24;x3:31 32 33 34)q)taba x1 x2 x3-----------------------Jon 11 21 31Samwell 12 22 32Davos 13 23 33Melisandre 14 24 34q)tab2:ungroup {xidx!(value x;til 3)}each 1!tabq)tab2a x idx---------------------Jon 11 0Jon 21 1Jon 31 2Samwell 12 0Samwell 22 1Samwell 32 2Davos 13 0Davos 23 1Davos 33 2Melisandre 14 0Melisandre 24 1Melisandre 34 2

Yes that’s a reasonable modification to make although the solution will appear less readable

q)raze{[x;y;z;k]?[x;();0b;(y;x;idx)!(y;z;k)]}[tab;`a;;]'[n;til count n:1_cols tab]

q)t1

a x idx

-----------------

Jon 11 0

Samwell 12 0

Davos 13 0

Melisandre 14 0

Jon 21 1

Samwell 22 1

Davos 23 1

Melisandre 24 1

Jon 31 2

Samwell 32 2

Davos 33 2

Melisandre 34 2


This is using the fact that we operate right to left (to name the columns you want to apply to xx1 etc. as n) and the each-both operator to project the column names (1_cols tab) and the unique indices you wanted into the variables z and k.


Hopefully this helps

Another performant approach using take (#) which retains the order so you don’t require an indicator column:

raze{ax xcol y#x}[tab]each(`a,'1_cols tab)

Terry

this can be performed with a generalized unpivot function:

unpivot:{ungroup (x!(key;value)@:) each y}

you supply the desired key and value column names as the x parameter and a table who’s key defines the columns to repeat as the y parameter.  the values in the key column retain the column names from which they came.

q)unpivot[kv] 1!tab
a          k  v

Jon        x1 11
Jon        x2 21
Jon        x3 31
Samwell    x1 12
Samwell    x2 22
Samwell    x3 32
Davos      x1 13
Davos      x2 23
Davos      x3 33
Melisandre x1 14
Melisandre x2 24
Melisandre x3 34

> q)tab:(a:JonSamwellDavosMelisandre;x1:11 12 13 14;x2:21 22 23 24;x3:31 32 33 34)

I have shortest and most performant solution ;)

k)(a:(#v)#t`a;x:v:,/t’1_!+t:tab)

also, it uses fact that # function replicates list until required length.

or, equivalent:  q)(a:(count v)#t`a;x:v:raze t 1_cols t:tab)

Code golf, anyone?

Thank you Nick! This seems shorter. Can I replace column names with index? i.e. 0 1 2 in place of x1, x2, x3?
 

Hello Sanny,

I checked this and found it easier to understand. Thank you! 

Can I add indicator for the column to which a record corresponds to, like I posted earlier? Thank you! 

Thanks! It indeed helped. I realized later that I would need multiple key columns (like column a) instead of just one. I did below modification to the query to make it work:

q)tab:([]a:JonSamwellDavosMelisandre;b:SnowTarlySeaworthUnknown;x1:11 12 13 14;x2:21 22 23 24;x3:31 32 33 34)q)taba b x1 x2 x3
------------------------------------

Jon Snow 11 21 31

Samwell Tarly 12 22 32

Davos Seaworth 13 23 33

Melisandre Unknown 14 24 34

q)tab2:raze{[x;y;z;k] ?[x;();0b;(y;x;idx)!(y;z;k)]}[tab;`a;;]'[n;til count n:1_cols tab]
q)tab2

a x idx

-----------------------------
Jon Snow 0
Samwell Tarly 0
Davos Seaworth 0
Melisandre Unknown 0
Jon 11 1
Samwell 12 1
Davos 13 1
Melisandre 14 1
Jon 21 2
Samwell 22 2
Davos 23 2
Melisandre 24 2
Jon 31 3
Samwell 32 3
Davos 33 3
Melisandre 34 3

q)tab3:raze{[x;y;z;k] ?[x;();0b;(y,(`x;`idx))!(y,(z;k))]}[tab;`a`b;;]'[n;til count n:2_cols tab]
q)tab3
a b x idx
-----------------------------------
Jon Snow 11 0
Samwell Tarly 12 0
Davos Seaworth 13 0
Melisandre Unknown 14 0
Jon Snow 21 1
Samwell Tarly 22 1
Davos Seaworth 23 1
Melisandre Unknown 24 1
Jon Snow 31 2
Samwell Tarly 32 2
Davos Seaworth 33 2
Melisandre Unknown 34 2

unpivot works here too, you just specify a 2 column primary key.  and you can map the columns to integers afterwards.

q)update (2_cols tab)?idx from unpivot[idxx] 2!tab
a          b        idx x

Jon        Snow     0   11
Jon        Snow     1   21
Jon        Snow     2   31
Samwell    Tarly    0   12
Samwell    Tarly    1   22
Samwell    Tarly    2   32
Davos      Seaworth 0   13
Davos      Seaworth 1   23
Davos      Seaworth 2   33
Melisandre Unknown  0   14
Melisandre Unknown  1   24
Melisandre Unknown  2   34