How to delete the first record by group ?

bigbug · March 19, 2012, 10:13am

Hi, all,I am trying to figure out how to delete the first record by group .Need help.I have a table “stk_data” as below to contain all stocks OHLCVrecords:sym date open high low close volume openint-----------------------------------------------------------------SH600809 2012.03.19 71.6 71.6 68.71 69.27 1.517e+004 1.058e+008SH600809 2012.03.16 69.46 71.58 69.31 71.02 1.3e+004 9.214e+007SH600809 2012.03.15 68.01 69.69 68.01 69.35 9751 6.736e+007…And transform to include the return rate as below “rt_tb” table:rt_tb:select date,close,ud:deltas close, rt:100*((deltas close)%(close-deltas close)) by sym from sym xasc date xasc stk_datasym |date ..--------|-----------------------------------------------------------------------------------------------------------..SH000001| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.082008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…SH000002| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.082008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…SH000003| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.082008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…Obviously, the “rt” field of first record in each “grouped record” ismeanless (it is 0w for not exist previous close price). So how torewrite the “rt_tb” script to get rid of the meanless record ?Thanks,Halley

P_Bukowinski1 · March 19, 2012, 11:00am

Try this:

rt:1_100*(… fromsymdate xasc stk_data

cheers
Patryk

bigbug · March 21, 2012, 12:28am

Thanks,On 3??19??, ???7?00??, Patryk Bukowinski

<p.bukowin…> wrote:> Try this:>> rt:1_100*(… fromsymdate xasc stk_data>> cheers> Patryk> On Mar 19, 2012 10:13 AM, “bigbug” <matlab…> wrote:>>>> > Hi, all,>> > I am trying to figure out how to delete the first record by group .> > Need help.>> > I have a table “stk_data” as below to contain all stocks OHLCV> > records:>> > sym date open high low close volume openint> > -----------------------------------------------------------------> > SH600809 2012.03.19 71.6 71.6 68.71 69.27 1.517e+004 1.058e+008> > SH600809 2012.03.16 69.46 71.58 69.31 71.02 1.3e+004 9.214e+007> > SH600809 2012.03.15 68.01 69.69 68.01 69.35 9751 6.736e+007> > …>> > And transform to include the return rate as below “rt_tb” table:>> > rt_tb:select date,close,ud:deltas close, rt:100*((deltas close)%(close-> > deltas close)) by sym from sym xasc date xasc stk_data>> > sym |> > date> > ..> > --------|>> > ------------------------------------------------------------------------------------------------------------..> > SH000001| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…> > SH000002| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…> > SH000003| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…>> > Obviously, the “rt” field of first record in each “grouped record” is> > meanless (it is 0w for not exist previous close price). So how to> > rewrite the “rt_tb” script to get rid of the meanless record ?>> > Thanks,> > Halley>> > –> >

Submitted via Google Groups</matlab…></p.bukowin…>

P_Bukowinski1 · March 21, 2012, 9:48pm

next is a bad choice here;

it adds 0N at the end of the list

is two times slower

uses twice as much space

q)a:100000?234.
q)a
196.2136 42.35917 97.59831 63.39222 71.42954 3.930051 160.196 106.1094 189.64..
q)\ts do[10000;1_a]
9515 1048768j
q)\ts do[10000;next a]
19937 2097344j

cheers,

Patryk

2012/3/21 Ajay <rathore.ajay@gmail.com>

Another way

… rt: next 100*((deltas close)%(close- deltas close))…

P_Bukowinski1 · March 22, 2012, 12:45pm

It’s not surprising that parallelized algo is faster, done properly can be even faster.

Please provide fair benchmarks as well;
Btw. parallel size is not accurate.

8core Xeon

q)\ts do[1000; 1!(select sym,date from tNew) ,’ flip (enlist `rt)! enlist rRate peach exec close from select date,close by sym from `sym`date xasc t]
5509 971104j

q)\ts do[1000;select sym,date,rt:.Q.fc[{100*1_'d%x-d:deltas each x}]close from select date,close by sym from symdate xasc t]
3670 970864j

I’m pretty sure this can be improved further…
?
P

sent from droid

Rohit_Tripathi · March 22, 2012, 3:25pm

try this is in k instead

P_Bukowinski1 · March 22, 2012, 6:59pm

your most recent code took
3766

q)\t do[1000;select date,rt:1_rt by sym from select sym,date, rt:{100*y%x-y}[close;deltas close] from symdate xasc t]
2279

both best,
please don’t put that on gpu, pick Rohit challenge instead.

P like Patryk

sent from limbo ;-)

bigbug · March 22, 2012, 11:52pm

woow? i learned a lot from your guys..On Mar 23, 2:59 am, Patryk Bukowinski

<p.bukowin…> wrote:> your most recent code took> 3766>> q)\t do[1000;select date,rt:1_rt by sym from select sym,date,> rt:{100y%x-y}[close;deltas close] from symdate xasc t]> 2279>> both best,> please don’t put that on gpu, pick Rohit challenge instead.>> P like Patryk>> sent from limbo ;-)> On Mar 22, 2012 2:47 PM, “Ajay” <rathore.a…> wrote:>>>> > Definitely can be improved>> > rRate: {100 1 ’ d%x-d:deltas each x}>> > q)\ts do[1000; select sym,date,rt: raze rRate peach 100 cut close> > from select date,close by sym from symdate xasc t]> >5780 839552j>> > q)\ts do[1000;select sym,date,rt:.Q.fc[{100*1’d%x-d:deltas each> > x}]close from select date,close by sym from symdate xasc t]> >5859 839504j>> > .Q.fc always cuts into two halves which might not be that efficient>> > P>> > sent from IPhone>> > On Mar 22, 12:45 pm, Patryk Bukowinski <p.bukowin…> wrote:> > > It’s not surprising that parallelized algo is faster, done properly can> > be> > > even faster.>> > > Please provide fair benchmarks as well;> > > Btw. parallel size is not accurate.>> > > 8core Xeon>> > > q)\ts do[1000; 1!(select sym,date from tNew) ,’ flip (enlist `rt)! enlist> > > rRate peach exec close from select date,close by sym from `sym`date xasc> > t]> > >5509 971104j>> > > q)\ts do[1000;select sym,date,rt:.Q.fc[{1001_'d%x-d:deltas each x}]close> > > from select date,close by sym from `sym`date xasc t]> > >3670 970864j>> > > I’m pretty sure this can be improved further…>> > > P>> > > sent from droid> > > On Mar 22, 2012 11:13 AM, “Ajay” <rathore.a…> wrote:>> > > > Shouldnt have been much lazy in analysing, next does seem to utilize> > > > more space in the outset.>> > > > Here is another approach using peach and slaves for computing the> > > > return rate for each group in parallel which can further improve the> > > > performance, looks memory efficient too (I am running with 2 slaves> > > > on> > > > a 2 core machine)> > > > q) t:([]sym:10000?`3;date:10000?.z.d;close:10000?200f)>> > > > q) tNew: select date,close by sym from `sym`date xasc t>> > > > q) rRate:{1001_((deltas x)%(x-deltas x))}>> > > > q)\ts do[1000; 1!(select sym,date from tNew) ,’ flip (enlist `rt)!> > > > enlist rRate peach exec close from tNew]> > > >6021 153568j>> > > > q)\ts do[1000; select date, rt: 1 _ 100*((deltas close)%(close-deltas> > > > close)) by sym from `sym`date xasc t]> > > >11989 794704j>> > > > Since we are talking about performance and memory, it takes half the> > > > time and much less memory.>> > > > Cheers-> > > > Ajay>> > > > On Mar 21, 9:48 pm, Patryk Bukowinski <p.bukowin…> wrote:> > > > > next is a bad choice here;> > > > > it adds 0N at the end of the list> > > > > is two times slower> > > > > uses twice as much space>> > > > > q)a:100000?234.> > > > > q)a> > > > > 196.2136 42.35917 97.59831 63.39222 71.42954 3.930051 160.196> >106.1094> > > > > 189.64..> > > > > q)\ts do[10000;1_a]> > > > >9515 1048768j> > > > > q)\ts do[10000;next a]> > > > >19937 2097344j>> > > > > cheers,> > > > > Patryk>> > > > > 2012/3/21 Ajay <rathore.a…>>> > > > > > Another way>> > > > > > … rt: next 100*((deltas close)%(close- deltas> > close))…>> > > > > > On Mar 19, 10:13 am, bigbug <matlab…> wrote:> > > > > > > Hi, all,>> > > > > > > I am trying to figure out how to delete the first record by> > group .> > > > > > > Need help.>> > > > > > > I have a table “stk_data” as below to contain all stocks OHLCV> > > > > > > records:>> > > > > > > sym date open high low close volume openint> > > > > > > -----------------------------------------------------------------> > > > > > > SH600809 2012.03.19 71.6 71.6 68.71 69.27 1.517e+004 1.058e+008> > > > > > > SH600809 2012.03.16 69.46 71.58 69.31 71.02 1.3e+004 9.214e+007> > > > > > > SH600809 2012.03.15 68.01 69.69 68.01 69.35 9751 6.736e+007> > > > > > > …>> > > > > > > And transform to include the return rate as below “rt_tb” table:>> > > > > > > rt_tb:select date,close,ud:deltas close, rt:100*((deltas> > > > close)%(close-> > > > > > > deltas close)) by sym from `sym xasc `date xasc stk_data>> > > > > > > sym |> > > > > > > date> > > > > > ..> > > > > > > --------|>> > --------------------------------------------------------------------------- --------------------------------..> > > > > > > SH000001| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > > > > > > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…> > > > > > > SH000002| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > > > > > > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…> > > > > > > SH000003| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > > > > > > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…>> > > > > > > Obviously, the “rt” field of first record in each “grouped> > record” is> > > > > > > meanless (it is 0w for not exist previous close price). So how> > to> > > > > > > rewrite the “rt_tb” script to get rid of the meanless record ?>> > > > > > > Thanks,> > > > > > > Halley>> > > > > > –> > > > > > You received this message because you are subscribed to the Google> > > > Groups> > > > > > “Kdb+ Personal Developers” group.> > > > > > To post to this group, send email to> > personal-kdbplus@googlegroups.com> > > > .> > > > > > To unsubscribe from this group, send email to> > > > > > personal-kdbplus+unsubscribe@googlegroups.com.> > > > > > For more options, visit this group at> > > > > >http://groups.google.com/group/personal-kdbplus?hl=en.-Hidequoted> > > > text ->> > > > > - Show quoted text ->> > > > –> > > > You received this message because you are subscribed to the Google> > Groups> > > > “Kdb+ Personal Developers” group.> > > > To post to this group, send email to personal-kdbplus@googlegroups.com> > .> > > > To unsubscribe from this group, send email to> > > > personal-kdbplus+unsubscribe@googlegroups.com.> > > > For more options, visit this group at> > > >http://groups.google.com/group/personal-kdbplus?hl=en.-Hide quoted> > text ->> > > - Show quoted text ->> > –> >

Submitted via Google Groups</matlab…></rathore.a…></p.bukowin…></rathore.a…></p.bukowin…></rathore.a…></p.bukowin…>

P_Bukowinski1 · March 23, 2012, 9:02am

last one is a cheat which is actually 0.5 times slower.

{1_x} peach by…

please check your code before you post…

I won’t post my next algo until you’ll find it on your own.

bartosz_kaliszu · March 23, 2012, 3:40pm

out of topic, but…

I have to say that I also received some message with that peach call. See below. But when looking through groups.google interface that post is missing. And this is the one that is not working…

And you two should calm down! Who is going to write the same query in k at last?! ;)

P_Bukowinski1 · March 23, 2012, 6:25pm

ok, here’s one:

k) rates:{ +{symdate`rt !(,:!x),+.:+x } (y i j;100*d%z-d:-‘:z:z i j)@: 1_’=x i j:<x i:<y}.

\t do[1000;rates t `sym`date`close]
2217

slightly faster than old select which took
2679

looks like there is no benefit from running deltas in parallel, maybe because it needs to keep track of previous values (if not implemented with two vectors), worth putting more effort to utilize more cores.

also not sure why it takes a little bit more space..maybe second index…

Cheers,
Patryk

P_Bukowinski1 · March 24, 2012, 9:58am

There was a typo in previous one;

but here is faster and shorter one:

k) rates:{+{symdate`rt!(!z;.:x z;.:(100*d%y-d:-':y)1_'z)}[y i j;z i j;=x i j:<x i:<y]}.

Running on old celeron :( , so can’t parallelize it… anyone?

Cheers,

Patryk

2012/3/23 Patryk Bukowinski <p.bukowinski@gmail.com>

ok, here’s one:

k) rates:{ +{symdate`rt !(,:!x),+.:+x } (y i j;100*d%z-d:-‘:z:z i j)@: 1_’=x i j:<x i:<y}.

\t do[1000;rates t `sym`date`close]
2217

slightly faster than old select which took
2679

looks like there is no benefit from running deltas in parallel, maybe because it needs to keep track of previous values (if not implemented with two vectors), worth putting more effort to utilize more cores.

also not sure why it takes a little bit more space..maybe second index…

Cheers,
Patryk

Topic		Replies	Views
How to calculate the daily portfolio return rate ? Community Support kdb-and-q	4	7	March 11, 2012
SQL Subset? Community Support kdb-and-q	17	7	May 16, 2008
pairwise difference applied to table Community Support kdb-and-q	8	7	March 28, 2009
[personal kdb+] How to get NULL if desired condition is not matching Community Support kdb-and-q	8	17	May 7, 2010
Get first row from a group Community Support kdb-and-q	7	14	January 11, 2016

How to delete the first record by group ?

Related topics