Pick the first row from a table and update a particular value

Hi all,

I am new to q, I have a question about updating a keyed table.

I have a keyed table with OrderID as key, and with some columns (e.g. price, quantity, time)

The table can be huge, containing at least 100,000 rows.

I am designing an operation to pick the first row and update the column (e.g. time)

I found that if use ‘first’ to get the first row, and update the row value as below:

r: first t;

r[`time]: .z.N;

the value in row is updated. but the row in table t is not updated.

I have to explicitly index the table in order to update the table:

t[r[orderID];time]: .z.N;

which this could be slow as the table size is big. I tried update statement and it’s even slower.

May I know is there any way for me to update r so that the change is committed to table t directly?

Thanks very much.

Cheers,

Juno

How are you writing your update statement?

Have you had a look at the virtual column ‘i’ - the row identifier?

More info here - http://code.kx.com/wiki/JB:QforMortals/queries_q_sql#The_select_Phrase

You can use this column in several ways for this update statement, here are two methods:

q)t:([orderID:10?10]price:10?10;qty:10?10;time:10?.z.p)

q)update time:.z.p from t where i=0

q)update time:.z.p from t where i=first i

HTH,

Sean

Hi Afonso,

Below is a function that uses ‘dot apply’ that can be applied to any keyed table.

f:{[x;y;z;d].[x;(y;z);:;d]}

It takes the keyed table as its first argument, the index and field as its second and third, and the value you want to apply as the fourth.

y and z are evaluated as the second argument of the apply statement because it’s reading them as:

‘apply to z at index y’

So in the example below it will apply to field ‘time’ at the index of the table’s first order_id

This is faster than running an update statement on your table, especially if you don’t want values updated in state:

q)\t:100 f[tab;exec first order_id from tab;`time;.z.N]

204

q)\t:100 update a:.z.N from tab where i=0

406

q)\t:1000 f[tab;exec first order_id from tab;`time;.z.N]

1860

q)\t:1000 update a:.z.N from tab where i=0

3229

https://code.kx.com/wiki/Reference/DotSymbol

Hope this helps,

James Burrows

Hi Sean and James,

Thanks to both of your suggestions. I tried both of them and found the performance is very similar to direct indexing(i.e. t[r[orderID];time]: .z.N;)

All of them beats normal update statement.

So it looks like there is nothing I could do to improve the performance by simply updating the entry directly after picking ‘first t’;

Thanks very much.

Cheers,

Juno

Thanks Sean and James,

I tried both ways and found that they are of similar performance as direct indexing (i.e. t[r[orderID];time]: .z.N;). All of them beats ordinary update statement.

So, unfortunately it looks like there is no way to improve the performance further on updating the entry after calling the ‘first t’

Cheers,

Juno

How are you actually running your code?

running ‘first’ on your/a keyed table will not output the key column, so you can’t actually index into orderID as per your example

Either way here are some showings (w32 3.6t tasked to 1 core)…

q)n:1000000

// cost to create sample table

q)\ts:5 t:([orderID:neg[n]?100000000]price:n?1000.;qty:n?1000000;time:n?.z.p)

406 67110768

// update

q)\ts:5 t:([orderID:neg[n]?100000000]price:n?1000.;qty:n?1000000;time:n?.z.p); update time:.z.p from `t where i=0

407 67111440

// dot amend

q)f:{[x;y;z;d].[x;(y;z);:;d]}

q)\ts:5 t:([orderID:neg[n]?100000000]price:n?1000.;qty:n?1000000;time:n?.z.p); f[t;exec first orderID from t;time;.z.p]

416 67111520

// direct assignment by use of exec 

q)\ts:5 t:([orderID:neg[n]?100000000]price:n?1000.;qty:n?1000000;time:n?.z.p); t[first exec orderID from t;`time]:.z.p

407 67111488

// direct assignment by use of key (likely what you are trying to do in your example)

q)\ts:5 t:([orderID:neg[n]?100000000]price:n?1000.;qty:n?1000000;time:n?.z.p); t[first key t;`time]:.z.p

407 67111376

// reverse lookup 

q)\ts:5 t:([orderID:neg[n]?100000000]price:n?1000.;qty:n?1000000;time:n?.z.p); t[t?first t;`time]:.z.p

437 67111456

Hi Sean,

Yes first doesn’t return the key, so I do either of below

  1. (first key t)+ (first t)

  2. exec all columns[0] from t

both have comparable performance

Cheers,

Juno

Indeed the problem I am facing is, I am able to pick the first row easily and read the data from the row.
the row returned is a reference.

I was expecting I can modify 1 column value (e.g. time), yet unfortunately once I start to modify the reference it then becomes a copy of the reference.

I could only have to reference the original table by its key/virtual index/where clause/dot operator and modify the value, which this operation is much heavier compared to modifying the returned reference.

If there exists a way that I could modify the value of the reference directly, then it would be perfect.

because such operation is heavy. Say if the table has 100,000 rows and I have to do 100,000 times of update. Each time pick the first row to update a column (assume the first row index changes after a column is modified)

Under such scenario, updating 100K times requires 10 seconds. If I only modified the returned reference(which turns out to be a copy), it takes less than 1 second.

Which triggers my curiosity to look for possibility to make it at 1 second.

It may be of no luck, just share my thoughts here.

Thanks very much.

Cheers,

Juno

Hi Juno

If your problem is actually part of the larger problem of updating the column at all or even many of the rows, consider a vector solution: take the whole column, amend what is to be changed, replace the column.

Best

Stephen




Stephen Taylor | Librarian | Kx | +44 7713 400852 | stephen@kx.com

you have not provided the table’s schema.  have you placed the the `u attribute on the orderID column. this should improve performance overall by changing the search algorithm from linear to hash lookup. 

update u#orderID from t

also, instead of updating a copy of the table, you can amend in place:

.[t;(id;time);:;tm]

or upsert a new dictionary:

t upsert orderID`time!(id;tm)