I am trying to implement a function to load large chunks of data, let’s say 1 billion rows, and load it into my local KDB+ instance. So far here is what I have:
\d .importer
fetch_data:{[table_names;batch_size;data_date]
/ (0) Open remote connection
h::hopen `$“:redacted:redacted:redacted”;
/ (1) Set arguments
tabs:enlist table_names;
bsize::batch_size;
dday::data_date;
/ (2) Create tables
{ .[x;();:;()] } each tabs;
/ (3) For each remote table;
/ (3.1) Get the table count
/ (3.2) Split the table to an optimal value
/ (3.4) Iterate over the table by using the splits
{[tab]
tcount:h({[t;d] count select from t where date=d};tab;dday);
splits:$[tcount>bsize; [batch:tcount div bsize;((0;bsize-1)+/:bsize*til batch),enlist (batch*bsize;tcount-1)];enlist(0;tcount-1)];
{[t;dday;split] t upsert h({[t;y;dday] ?[t;((=;date;dday);(within;
i;y));0b;()]};t;split;dday) }[tab;dday] each splits;
}each tabs;
}
fetch_today:fetch_data[;;.z.d];
fetch_quotes_today:fetch_data[`trades;;.z.d];
fetch_quotes:fetch_data[`trades;;];
Then I simply load the script and call, let’s say,
.importer.fetch_quotes_today[1000000]
..and it works fine if there is 40-50 million data tops. But if the rows count exceeds 100M then the execution takes forever. Now that I have about a billion data rows, is there any more effective way to transfer such data directly from KDB to KDB?
Regards!