Transferring Large Data From Server to Client

kadir_kilicoglu1 · March 16, 2020, 8:53pm

I am trying to implement a function to load large chunks of data, let’s say 1 billion rows, and load it into my local KDB+ instance. So far here is what I have:

\d .importer

fetch_data:{[table_names;batch_size;data_date]

/ (0) Open remote connection

h::hopen `$“:redacted:redacted:redacted”;

/ (1) Set arguments

tabs:enlist table_names;

bsize::batch_size;

dday::data_date;

/ (2) Create tables

{ .[x;();:;()] } each tabs;

/ (3) For each remote table;

/ (3.1) Get the table count

/ (3.2) Split the table to an optimal value

/ (3.4) Iterate over the table by using the splits

{[tab]

tcount:h({[t;d] count select from t where date=d};tab;dday);

splits:$[tcount>bsize; [batch:tcount div bsize;((0;bsize-1)+/:bsize*til batch),enlist (batch*bsize;tcount-1)];enlist(0;tcount-1)];

{[t;dday;split] t upsert h({[t;y;dday] ?[t;((=;date;dday);(within;i;y));0b;()]};t;split;dday) }[tab;dday] each splits;

}each tabs;

}

fetch_today:fetch_data[;;.z.d];

fetch_quotes_today:fetch_data[`trades;;.z.d];

fetch_quotes:fetch_data[`trades;;];

Then I simply load the script and call, let’s say,

.importer.fetch_quotes_today[1000000]

..and it works fine if there is 40-50 million data tops. But if the rows count exceeds 100M then the execution takes forever. Now that I have about a billion data rows, is there any more effective way to transfer such data directly from KDB to KDB?

Regards!

wp1 · March 16, 2020, 9:13pm

Sounds like you are running out of memory.
Glancing at the script, it looks like you upsert in memory. You could upsert on disk.

kadir_kilicoglu1 · March 17, 2020, 9:37am

Well, I didn’t face any OOM or Swap Memory issues nor alarms in monitoring, thank you for your suggestion. I’ll try to persist directly on disk and see!

samsogade · March 17, 2020, 12:13pm

Would it not make more sense to scp or rsync in this type of scenario?

kadir_kilicoglu1 · March 18, 2020, 3:15pm

As @wp suggested, I switched to upsert asynchronously into a file directly and it seems working fine now! Below you can find the updated snippet:

…

/ (1) Set arguments

tabs:enlist table_names;

bsize::batch_size;

dday::data_date;

/ (2) Create tables

{ .[x;();:;()] } each tabs;

/ (3) For each remote table;

/ (3.1) Get the table count

/ (3.2) Split the table to an optimal value

/ (3.3) Fetch the meta to create a proper CSV

/ (3.4) Open file handle

/ (3.5) Iterate over the table by using the splits and for each batch

/ (3.5.1) Asynchronously save data to the CSV file

{[tab]

tcount:h({[t;d] count select from t where date=d};tab;dday);

splits:$[tcount>bsize; [batch:tcount div bsize;((0;bsize-1)+/:bsize*til batch),enlist (batch*bsize;tcount-1)];enlist(0;tcount-1)];

tmeta: key h({[t] meta t};tab);

(hsym `$ raze string tab,“.csv”) 0: enlist “,” sv raze string flip tmeta cols tmeta;

fhandle:hopen (hsym `$ raze string tab,“.csv”);

{[t;dday;tcount;fhandle;split]

neg[fhandle] peach “,” 0:h({[t;y;dday] ?[t;((=;date;dday);(within;i;y));0b;()]};t;split;dday);

show raze string split[0],“-”,split[1]," of ", tcount; }[tab;dday;tcount;fhandle] each splits;

hclose fhandle;

}each tabs;

hclose h;

…

Thanks for the advice!

Regards @wp and @Morten.

Topic		Replies	Views
Trouble With Huge CSVs Community Support kdb-and-q	3	1	July 18, 2021
Large files .. Community Support kdb-and-q	0	0	April 28, 2009
Trouble With Huge CSVs Community Support imported , kdb-and-q	2	0	July 11, 2021
Sql Server to KDB Community Support kdb-and-q	4	17	April 15, 2008
How to load huge data in KDB Community Support kdb-and-q	6	16	October 5, 2016

Transferring Large Data From Server to Client

Related topics