Adding Columns Calculated on Grouping Value

Hi al

Struggling to do something elegantly in q - Given a table such as the following:

q)t:(timestamp:10?.z.P;sym:10?IBMGOOG;bid:10?2.;ask:10?2.)

q)t

timestamp                     sym  bid        ask      


2000.10.20D22:21:31.475859324 IBM  1.80615    0.824634 

2003.09.24D14:46:15.547312064 IBM  1.550058   1.975569 

2008.09.22D02:28:51.659587360 GOOG 0.7739636  0.7734706

2011.12.03D09:26:32.196052096 IBM  1.264823   1.453562 

2001.12.09D04:18:46.408592112 GOOG 0.8657069  0.8093092

2006.05.04D22:51:25.395450560 IBM  0.4861672  1.671013 

2009.04.19D08:22:33.286842304 IBM  0.7835905  1.285474 

2005.01.04D13:19:53.380033792 GOOG 0.04902672 1.166052 

2004.06.12D11:19:19.618774224 GOOG 0.3382087  0.2849871

2000.09.14D11:06:33.485095772 IBM  0.7882164  1.829976 

If I wanted to add a column to this table containing say a log-return calc based off the mid price ((bid+ask)/2), obviously I cant just do something like:

q)select lr:deltas log (0.5*(bid+ask)) from t

lr        


0.2741348 

0.2927764 

-0.8234601

0.5634396 

-0.4842151

0.2529791 

-0.0417053

-0.5322874

-0.6677038

1.435379  

As the return calcs need to be grouped by symbol if symbols are interspersed in the table. In R I would treat sym as a factor and perform calculations grouped by sym, however not quite sure how I can do this in q. Ive tried:

q)select lr:deltas log (0.5*(bid+ask)) by sym from t

sym | lr                                                           

----| -------------------------------------------------------------

GOOG| -0.256549 0.07922455 -0.3210136 -0.6677038                   

IBM | 0.2741348 0.2927764 -0.2600205 -0.231236 -0.0417053 0.2353877

or

q)logret:{deltas log x}

q)select (logret;0.5*(bid+ask)) fby sym from t

q)select (logret;0.5*(bid+ask)) fby sym from t

sym       


0.2741348 

0.2927764 

-0.256549 

-0.2600205

0.07922455

-0.231236 

-0.0417053

-0.3210136

-0.6677038

0.2353877 

I guess I have two issues :

  • grouping these calcs across syms and transforming to another column;

  • as the log-diffs will have cardinality N-1 for each group of N observations per sym - either dropping the first observation or prefixing zero at the start of the log-returns series

Obviously I could split the table into multiple tables by sym and perform the calcs separately, but it would be lovely if I had a nice way in q to do this in-place across the entire table.

Cheers

– Rory

q)update ret:0n-':log .5*bid+ask by sym from t

timestamp                     sym  bid       ask       ret


2005.02.09D15:25:31.631783776 GOOG 0.986367  0.7855048

2001.01.22D06:46:11.490555628 GOOG 1.157041  1.034182  0.2124233

2012.03.23D20:46:02.028508928 IBM  0.1677772 1.031959

2003.08.19D08:22:50.660563776 GOOG 0.3919814 0.8133284 -0.5977231

2003.02.14D12:36:17.469642560 IBM  0.751276  0.3561677 -0.08004754

2001.12.19D23:42:13.133055448 GOOG 1.22749   0.6035445 0.4181446

2002.01.17D04:52:18.947410616 IBM  1.058962  1.570066  0.8645597

2012.10.08D21:40:09.037292800 GOOG 1.38322   1.069419  0.2922833

2009.03.11D19:56:32.002397632 IBM  0.4593231 1.422343  -0.3344563

2012.04.28D05:22:10.989846080 GOOG 1.383906  0.823194  -0.1054851

Cheers,

  Attila

Beautiful - I didnt know about eachprior before. Thank you very much!