C# perforance

Hi all.

Can I reach better performance to bulk insert via c#?

I use this code and I reached about 1.6mil/sec:

Random rnd = new Random();

            string syms = new string { “abc”, “def”, “ghi”, “jki” };

            c c = new c(“localhost”, 5001);

            int len = 1000000;

            object x = new object[4];

            System.TimeSpan time = new System.TimeSpan[len];

            string sym = new string[len];

            double price = new double[len];

            int size = new int[len];

            for (int i = 0; i < len; i++)

            {

                time[i] = DateTime.Now.TimeOfDay;

                sym[i] = syms[rnd.Next(0, syms.Length)];

                price[i] = (double)rnd.Next(0, 200);

                size[i] = 100 * rnd.Next(1, 10);

            }

            x[0] = time;

            x[1] = sym;

            x[2] = price;

            x[3] = size;

            DateTime timeStart = DateTime.Now;

            c.k(“mytrade:(time:();sym:();price:();size:())”);

            c.k(“insert”, “mytrade”, x);

            DateTime timeEnd = DateTime.Now;

            Console.WriteLine(“timeInsert:\t” + (timeEnd - timeStart).TotalSeconds + "\tPerformance: " + len / (timeEnd - timeStart).TotalSeconds);

            Console.ReadLine();

Added:

Using .ks instead .k achieved 2.2Mil/s

Thanks Vadim for proving benchmark. It was helpful as we are doing the same exercise using c.k.

I’ve not tried this, but they claim this api is faster than c.cs from kx
http://www.devnet.de/exxeleron/qsharp

Thank you for link.
Right out of box it looks very good at async mode(3x faster)!
If data sended by parts it also faster 2x times

I was curious about reading performance instead writing, so I make small reading test with default c# interface[c] and implementation from Exxeleron[Qsharp].

First I made DB with 50m rows:

":G:/BinData/KDB3/2010.01.01/t/ upsert .Q.en[:.;( ti:09:30:00.0 +50000000?06:00:00.0 ;a:100 +50000000?100f ; ap:50000000?50;b:101 +50000000?100f ; ab:50000000?50; id: 50000000?10)]

And next I use this code:

            q.Sync(@“\l G:/BinData/KDB3”);

            DateTime start = DateTime.Now;

            q.Sync(“table: select from t where date=2010.01.01, i>=10000000, i<11000000”);

            Console.WriteLine("Select time: "+(DateTime.Now - start));

            start = DateTime.Now;

            object table1 = q.Sync(“select from table”);

            Console.WriteLine("Transfer time[Qsharp]: " + (DateTime.Now - start));

            start = DateTime.Now;

            object table2 = c.k(“select from table”);

            Console.WriteLine("Transfer time[c.k]: " + (DateTime.Now - start));

            start = DateTime.Now;

            object table3 = q.Sync(“select from t where date=2010.01.01, i>=10000000, i<11000000”);

            Console.WriteLine("Select+Transfer time[Qsharp]: "+(DateTime.Now - start));

            start = DateTime.Now;

            object table4 = c.k(“select from t where date=2010.01.01, i>=10000000, i<11000000”);

            Console.WriteLine("Select+Transfer time[c.k]: "+(DateTime.Now - start));

Result:

Select time: 00:00:00.3330191

Transfer time[Qsharp]: 00:00:01.2770731

Transfer time[c.k]: 00:00:00.6580376

Select+Transfer time[Qsharp]: 00:00:01.5900910

Select+Transfer time[c.k]: 00:00:00.9750558

Selection 1m rows from a partition of 50m and transfer it to client, best I have 1m rows/sec, looks not perfect,

compared to sending data to KDB it slower more than 2x times, when usually databases reading faster 2x times.

Where I wrong?

this doesn’t answer your question, but I think is interesting to note anyway

=,in and within are optimized - so you may see an improvement in the select with

select from t where date=2010.01.01, i within(10000000,11000000-1)

and if you are repeatedly selecting from this table to iterate over the whole table, with nothing going on inbetween, just map that date once, and then select the indicies from it, then unmap it. e.g.

mappedT:select from t where date=2010.01.01;

select from mappedT where i within(10000000,11000000-1)

…

select from mappedT where i within(49000000,50000000-1)

delete mappedT from`.

maybe also post into the exxeleron google group about their api? (They’re welcome to reply here if they like).

my example will cause confusion if these literals are replaced with vars. note the ; in place of ,
i within(10000000,11000000-1)

with vars

i within(start;end-1)

Thank you for hint, changing to within method greatly increase selection time!

Select time: 00:00:00.0130008

Transfer time[Qsharp]: 00:00:01.2790731

Transfer time[c.k]: 00:00:00.6640380

Select+Transfer time[Qsharp]: 00:00:01.2790732

Select+Transfer time[c.k]: 00:00:00.6560375

PS Qsharp’s transfer time can be optimized by changing part size, it almost equals time to default c# interface.

So I think bottleneck is other place, may be it will good to transfer data parallel

Thank you for testing our interface and notifying us about the performance you see.

We are looking if and how we can improve performance in your described use case.

Best Regards

Maciej Lach

Maybe it will interest, as I said above, after chunk size optimization, max that I got is about 1.6M/sec(60ms for 100k records).

We have released qSharp 2.0.2 which provides performance improvements in your described use case, as well as in other cases when large chunks of data are retrieved from kdb+.

Current qSharp version provides up to 20% performance boost (in comparison with c.cs) while retrieving this data sample:

sample: ([] ti:09:30:00.0 +1000000?06:00:00.0 ;a:100 +1000000?100f ; ap:1000000?50;b:101 +1000000?100f ; ab:1000000?50; id: 1000000?10)

Release is available at: https://github.com/exxeleron/qSharp/releases/tag/qSharp-2.0.2

Best regards

Maciej Lach

Thank you!
I agree performance has increased, now  it better with biggest chunks:

Select time: 00:00:00.0120007

Transfer time[Qsharp]: 00:00:00.4160238

Transfer time[c.k]: 00:00:00.5430311

Select+Transfer time[Qsharp]: 00:00:00.4190239

Select+Transfer time[c.k]: 00:00:00.5460313

Also I tried to add unsafe parts and got this results:

Select time: 00:00:00.0160009

Transfer time[Qsharp]: 00:00:00.2430139

Transfer time[c.k]: 00:00:00.5430310

Select+Transfer time[Qsharp]: 00:00:00.2530145

Select+Transfer time[c.k]: 00:00:00.5550318

Additionally I found that better is join date+time on kdb server and next parse it on client side direct to DateTime(I modified Timestamp to output DateTime)

Using eparately DateTime as date and TimeSpan as time not good for C#

ps I can upload modifications if somebody interested.

Additional small question.
Can I avoid using this                “update Symbol: int$(sym?Symbol) from q” 

if I already have sym file and enumerated Symbol?

It needed for sending through network, because parsing string symbols(versus integer values) significantly slows all at client(c#) side.

if Symbol is already enumerated, then you can just do `int$Symbol

Thanks!