How to delete the first record by group ?

Hi, all,I am trying to figure out how to delete the first record by group .Need help.I have a table “stk_data” as below to contain all stocks OHLCVrecords:sym date open high low close volume openint-----------------------------------------------------------------SH600809 2012.03.19 71.6 71.6 68.71 69.27 1.517e+004 1.058e+008SH600809 2012.03.16 69.46 71.58 69.31 71.02 1.3e+004 9.214e+007SH600809 2012.03.15 68.01 69.69 68.01 69.35 9751 6.736e+007…And transform to include the return rate as below “rt_tb” table:rt_tb:select date,close,ud:deltas close, rt:100*((deltas close)%(close-deltas close)) by sym from sym xasc date xasc stk_datasym |date ..--------|-----------------------------------------------------------------------------------------------------------..SH000001| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.082008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…SH000002| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.082008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…SH000003| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.082008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…Obviously, the “rt” field of first record in each “grouped record” ismeanless (it is 0w for not exist previous close price). So how torewrite the “rt_tb” script to get rid of the meanless record ?Thanks,Halley

Try this:

rt:1_100*(… fromsymdate xasc stk_data

cheers
Patryk

Thanks,On 3??19??, ???7?00??, Patryk Bukowinski

<p.bukowin…> wrote:> Try this:>> rt:1_100*(… fromsymdate xasc stk_data>> cheers> Patryk> On Mar 19, 2012 10:13 AM, “bigbug” <matlab…> wrote:>>>> > Hi, all,>> > I am trying to figure out how to delete the first record by group .> > Need help.>> > I have a table “stk_data” as below to contain all stocks OHLCV> > records:>> > sym date open high low close volume openint> > -----------------------------------------------------------------> > SH600809 2012.03.19 71.6 71.6 68.71 69.27 1.517e+004 1.058e+008> > SH600809 2012.03.16 69.46 71.58 69.31 71.02 1.3e+004 9.214e+007> > SH600809 2012.03.15 68.01 69.69 68.01 69.35 9751 6.736e+007> > …>> > And transform to include the return rate as below “rt_tb” table:>> > rt_tb:select date,close,ud:deltas close, rt:100*((deltas close)%(close-> > deltas close)) by sym from sym xasc date xasc stk_data>> > sym |> > date> > ..> > --------|>> > ------------------------------------------------------------------------------------------------------------..> > SH000001| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…> > SH000002| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…> > SH000003| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…>> > Obviously, the “rt” field of first record in each “grouped record” is> > meanless (it is 0w for not exist previous close price). So how to> > rewrite the “rt_tb” script to get rid of the meanless record ?>> > Thanks,> > Halley>> > –> >

Submitted via Google Groups</matlab…></p.bukowin…>

next is a bad choice here;

it adds 0N at the end of the list

is two times slower

uses twice as much space

q)a:100000?234.
q)a
196.2136 42.35917 97.59831 63.39222 71.42954 3.930051 160.196 106.1094 189.64..
q)\ts do[10000;1_a]
9515 1048768j
q)\ts do[10000;next a]
19937 2097344j

cheers,

Patryk

2012/3/21 Ajay <rathore.ajay@gmail.com>

Another way

… rt: next 100*((deltas close)%(close- deltas close))…

It’s not surprising that parallelized algo is faster, done properly can be even faster.

Please provide fair benchmarks as well;
Btw. parallel size is not accurate.

8core Xeon

q)\ts do[1000; 1!(select sym,date from tNew) ,’ flip (enlist `rt)! enlist rRate peach exec close from select date,close by sym from `sym`date xasc t]
5509 971104j

q)\ts do[1000;select sym,date,rt:.Q.fc[{100*1_'d%x-d:deltas each x}]close from select date,close by sym from symdate xasc t]
3670 970864j

I’m pretty sure this can be improved further…
?
P

sent from droid

try this is in k instead

your most recent code took
3766

q)\t do[1000;select date,rt:1_rt by sym from select sym,date, rt:{100*y%x-y}[close;deltas close] from symdate xasc t]
2279

both best,
please don’t put that on gpu, pick Rohit challenge instead.

P like Patryk

sent from limbo ;-)

woow? i learned a lot from your guys..On Mar 23, 2:59 am, Patryk Bukowinski

<p.bukowin…> wrote:> your most recent code took> 3766>> q)\t do[1000;select date,rt:1_rt by sym from select sym,date,> rt:{100y%x-y}[close;deltas close] from symdate xasc t]> 2279>> both best,> please don’t put that on gpu, pick Rohit challenge instead.>> P like Patryk>> sent from limbo ;-)> On Mar 22, 2012 2:47 PM, “Ajay” <rathore.a…> wrote:>>>> > Definitely can be improved>> > rRate: {100 1 ’ d%x-d:deltas each x}>> > q)\ts do[1000; select sym,date,rt: raze rRate peach 100 cut close> > from select date,close by sym from symdate xasc t]> >5780 839552j>> > q)\ts do[1000;select sym,date,rt:.Q.fc[{100*1’d%x-d:deltas each> > x}]close from select date,close by sym from symdate xasc t]> >5859 839504j>> > .Q.fc always cuts into two halves which might not be that efficient>> > P>> > sent from IPhone>> > On Mar 22, 12:45 pm, Patryk Bukowinski <p.bukowin…> wrote:> > > It’s not surprising that parallelized algo is faster, done properly can> > be> > > even faster.>> > > Please provide fair benchmarks as well;> > > Btw. parallel size is not accurate.>> > > 8core Xeon>> > > q)\ts do[1000; 1!(select sym,date from tNew) ,’ flip (enlist `rt)! enlist> > > rRate peach exec close from select date,close by sym from `sym`date xasc> > t]> > >5509 971104j>> > > q)\ts do[1000;select sym,date,rt:.Q.fc[{1001_'d%x-d:deltas each x}]close> > > from select date,close by sym from `sym`date xasc t]> > >3670 970864j>> > > I’m pretty sure this can be improved further…>> > > P>> > > sent from droid> > > On Mar 22, 2012 11:13 AM, “Ajay” <rathore.a…> wrote:>> > > > Shouldnt have been much lazy in analysing, next does seem to utilize> > > > more space in the outset.>> > > > Here is another approach using peach and slaves for computing the> > > > return rate for each group in parallel which can further improve the> > > > performance, looks memory efficient too (I am running with 2 slaves> > > > on> > > > a 2 core machine)> > > > q) t:([]sym:10000?`3;date:10000?.z.d;close:10000?200f)>> > > > q) tNew: select date,close by sym from `sym`date xasc t>> > > > q) rRate:{1001_((deltas x)%(x-deltas x))}>> > > > q)\ts do[1000; 1!(select sym,date from tNew) ,’ flip (enlist `rt)!> > > > enlist rRate peach exec close from tNew]> > > >6021 153568j>> > > > q)\ts do[1000; select date, rt: 1 _ 100*((deltas close)%(close-deltas> > > > close)) by sym from `sym`date xasc t]> > > >11989 794704j>> > > > Since we are talking about performance and memory, it takes half the> > > > time and much less memory.>> > > > Cheers-> > > > Ajay>> > > > On Mar 21, 9:48 pm, Patryk Bukowinski <p.bukowin…> wrote:> > > > > next is a bad choice here;> > > > > it adds 0N at the end of the list> > > > > is two times slower> > > > > uses twice as much space>> > > > > q)a:100000?234.> > > > > q)a> > > > > 196.2136 42.35917 97.59831 63.39222 71.42954 3.930051 160.196> >106.1094> > > > > 189.64..> > > > > q)\ts do[10000;1_a]> > > > >9515 1048768j> > > > > q)\ts do[10000;next a]> > > > >19937 2097344j>> > > > > cheers,> > > > > Patryk>> > > > > 2012/3/21 Ajay <rathore.a…>>> > > > > > Another way>> > > > > > … rt: next 100*((deltas close)%(close- deltas> > close))…>> > > > > > On Mar 19, 10:13 am, bigbug <matlab…> wrote:> > > > > > > Hi, all,>> > > > > > > I am trying to figure out how to delete the first record by> > group .> > > > > > > Need help.>> > > > > > > I have a table “stk_data” as below to contain all stocks OHLCV> > > > > > > records:>> > > > > > > sym date open high low close volume openint> > > > > > > -----------------------------------------------------------------> > > > > > > SH600809 2012.03.19 71.6 71.6 68.71 69.27 1.517e+004 1.058e+008> > > > > > > SH600809 2012.03.16 69.46 71.58 69.31 71.02 1.3e+004 9.214e+007> > > > > > > SH600809 2012.03.15 68.01 69.69 68.01 69.35 9751 6.736e+007> > > > > > > …>> > > > > > > And transform to include the return rate as below “rt_tb” table:>> > > > > > > rt_tb:select date,close,ud:deltas close, rt:100*((deltas> > > > close)%(close-> > > > > > > deltas close)) by sym from `sym xasc `date xasc stk_data>> > > > > > > sym |> > > > > > > date> > > > > > ..> > > > > > > --------|>> > --------------------------------------------------------------------------- ­­--------------------------------..> > > > > > > SH000001| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > > > > > > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…> > > > > > > SH000002| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > > > > > > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…> > > > > > > SH000003| 2008.01.02 2008.01.03 2008.01.04 2008.01.07 2008.01.08> > > > > > > 2008.01.09 2008.01.10 2008.01.11 2008.01.14 2008.01…>> > > > > > > Obviously, the “rt” field of first record in each “grouped> > record” is> > > > > > > meanless (it is 0w for not exist previous close price). So how> > to> > > > > > > rewrite the “rt_tb” script to get rid of the meanless record ?>> > > > > > > Thanks,> > > > > > > Halley>> > > > > > –> > > > > > You received this message because you are subscribed to the Google> > > > Groups> > > > > > “Kdb+ Personal Developers” group.> > > > > > To post to this group, send email to> > personal-kdbplus@googlegroups.com> > > > .> > > > > > To unsubscribe from this group, send email to> > > > > > personal-kdbplus+unsubscribe@googlegroups.com.> > > > > > For more options, visit this group at> > > > > >http://groups.google.com/group/personal-kdbplus?hl=en.-Hidequoted&gt; > > > text ->> > > > > - Show quoted text ->> > > > –> > > > You received this message because you are subscribed to the Google> > Groups> > > > “Kdb+ Personal Developers” group.> > > > To post to this group, send email to personal-kdbplus@googlegroups.com> > .> > > > To unsubscribe from this group, send email to> > > > personal-kdbplus+unsubscribe@googlegroups.com.> > > > For more options, visit this group at> > > >http://groups.google.com/group/personal-kdbplus?hl=en.-Hide quoted> > text ->> > > - Show quoted text ->> > –> >

Submitted via Google Groups</matlab…></rathore.a…></p.bukowin…></rathore.a…></p.bukowin…></rathore.a…></p.bukowin…>

last one is a cheat which is actually 0.5 times slower.

{1_x} peach by…

please check your code before you post…

I won’t post my next algo until you’ll find it on your own.

out of topic, but…

I have to say that I also received some message with that peach call. See below. But when looking through groups.google interface that post is missing. And this is the one that is not working…

And you two should calm down! Who is going to write the same query in k at last?! ;)

ok, here’s one:

k) rates:{ +{symdate`rt !(,:!x),+.:+x } (y i j;100*d%z-d:-‘:z:z i j)@: 1_’=x i j:<x i:<y}.

\t do[1000;rates t `sym`date`close]
2217

slightly faster than old select which took
2679

looks like there is no benefit from running deltas in parallel, maybe because it needs to keep track of previous values (if not implemented with two vectors), worth putting more effort to utilize more cores.

also not sure why it takes a little bit more space..maybe second index…

Cheers,
Patryk

There was a typo in previous one;

but here is faster and shorter one:

k) rates:{+{symdate`rt!(!z;.:x z;.:(100*d%y-d:-':y)1_'z)}[y i j;z i j;=x i j:<x i:<y]}.

Running on old celeron :( , so can’t parallelize it… anyone?

Cheers,

Patryk

2012/3/23 Patryk Bukowinski <p.bukowinski@gmail.com>

ok, here’s one:

k) rates:{ +{symdate`rt !(,:!x),+.:+x } (y i j;100*d%z-d:-‘:z:z i j)@: 1_’=x i j:<x i:<y}.

\t do[1000;rates t `sym`date`close]
2217

slightly faster than old select which took
2679

looks like there is no benefit from running deltas in parallel, maybe because it needs to keep track of previous values (if not implemented with two vectors), worth putting more effort to utilize more cores.

also not sure why it takes a little bit more space..maybe second index…

Cheers,
Patryk