aj on a large data set

Ben_R · May 29, 2019, 10:06pm

running an aj on a very large data set. command keeps failing. Is there a better/more efficient want to aj? Thx.

Nick10 · May 30, 2019, 3:33am

when you say ‘failing’ does that mean it throws an exception? please provide an example

do you mean it returns the wrong results? ensure your data is sorted by time (within equal sym partitions)

do you mean it runs for a very long time?

ensure that the second table (third argument) has a p or g attribute on the column corresponding to the first join key (first argument).

assuming you have trades in a table t, and quotes in a table q, the typical aj command would be:

t:aj[symtime;t] q

in case you lost your `p attribute on q:

t:aj[symtime;t] update `p#sym from q

quoting from “Q Tips”

Joining with a table that is lacking a p or g attribute on the first exact match column will lead to
extremely slow joins. In the best case, this will take minutes if not hours to complete, in the worst case it
will crash your machine.

…

Again, as we discussed in Section 8.2 [97], applying proper attributes is critical for good join performance.
While kdb+s optimizations can typically compensate for missing attributes when using lj, pj,
ij and uj, both aj and wj are unforgiving.

carfield1 · June 10, 2019, 1:18pm

a simple solution is just divide it into smaller groups, like

{ aj[`time;select from trade where date=x,sym=y;select from quote where date=x,sym=y } ./: dates cross syms

Topic		Replies	Views
aj with more than one syms? Community Support kdb-and-q	1	2	March 25, 2015
Help with q Community Support kdb-and-q	5	11	December 23, 2009
Trades table parted by sym and sorted by time (within each sym) Academy Support imported	0	0	November 5, 2024
aj Attributes Community Support kdb-and-q	1	2	September 18, 2015
Perfrmance for datatime search a in b where a= b Community Support kdb-and-q	1	5	March 5, 2012

aj on a large data set

Related topics