slow performance of win32 version of KDB

carfield11 · September 7, 2015, 12:04pm

Hi, I like to test some simple thing on windows version of 32bit KDB, I inputted a table with 99722 row, with date, sym and other 15 columns of numeric data. However, if I just take a simple average of that 15 column, it take more than 3 minutes to return, will anyone have any clue about why it is that slow? How can I start to analysis this performance issue?

LamHinYan · September 7, 2015, 1:42pm

post your code plz. i can run mine 1k times in 1.7 sec

n:99722;

tab:(

date:n#.z.D;

sym:n#`AAA;

f1:1.0*til n; f2:1.0*til n; f3:1.0*til n; f4:1.0*til n; f5:1.0*til n;

f6:1.0*til n; f7:1.0*til n; f8:1.0*til n; f9:1.0*til n; f10:1.0*til n;

f11:1.0*til n; f12:1.0*til n; f13:1.0*til n; f14:1.0*til n; f15:1.0*til n);

\ts:1000 select

avg f1,avg f2,avg f3,avg f4,avg f5,

avg f6,avg f7,avg f8,avg f9,avg f10,

avg f11,avg f12,avg f13,avg f14,avg f15

from tab

q) \ts:1000 select

avg f1,avg f2,avg f3,avg f4,avg f5,

avg f6,avg f7,avg f8,avg f9,avg f10,

avg f11,avg f12,avg f13,avg f14,avg f15

from tab

1754j, 4304j

carfield11 · September 14, 2015, 9:15am

Sorry, just see your reply, I just check with this simple query

select avg c1, avg c2, avg c3, avg c4 , … by sym from tab where date in dates, sym in syms

dates is a list of date in the partition table, I put only 10 days of data , with total 99722 rows.

syms in a list of symbol in the table, which is about 5000 symbols

LamHinYan · September 14, 2015, 9:30am

What does “meta tab” show?

carfield11 · September 15, 2015, 8:54am

Hi Yan, it look like this

| c | t | f | a |
| date | d | | |
| sym | s | | p |
| c1 | f | | |
| c2 | f | | |
| c3 | f | | |
| c4 | f | | |
| c5 | j | | |
| c6 | i | | |
| c7 | e | | |
| c7 | e | | |
| c9 | j | | |
| c10 | f | | |
| c11 | f | | |
| c12 | f | | |
| c13 | f | | |
| c14 | j | | |
| c15 | f | | |
| c16 | f | | |
| c17 | f | | |
| | | | |

On Monday, September 14, 2015 at 4:30:17 PM UTC+8, Yan Yan wrote:

What does “meta tab” show?

quintanar4011 · September 15, 2015, 9:22am

Hi Carfield,

p# - partition attribute should be used for tables with tens of millions of rows. I see that your table probably has just ~ 2rows per each sym, p# performance will be awful in this case.

I’d suggest to switch to `g# - group attribute. Or use sort attribute if appropriate.

WBR, Andrey Kozyrev.

???, 15 ??? 2015 ?., 10:54:07 UTC+3 ??? Carfield Yim ???:

Hi Yan, it look like this

| c | t | f | a |
| date | d | | |
| sym | s | | p |
| c1 | f | | |
| c2 | f | | |
| c3 | f | | |
| c4 | f | | |
| c5 | j | | |
| c6 | i | | |
| c7 | e | | |
| c7 | e | | |
| c9 | j | | |
| c10 | f | | |
| c11 | f | | |
| c12 | f | | |
| c13 | f | | |
| c14 | j | | |
| c15 | f | | |
| c16 | f | | |
| c17 | f | | |
| | | | |

On Monday, September 14, 2015 at 4:30:17 PM UTC+8, Yan Yan wrote:

What does “meta tab” show?

LamHinYan · September 15, 2015, 9:30am

I have changed the 2nd c7 to c8. The avg query took 870 ms only. Carfield asked for less than 100k rows.

/ simple avg of partitioned db

dbDir:`:/tmp/hdb;

system "cd ",1_string dbDir;

genDay:{

data:(

date:x#y;

sym:{`$“sym”,“0”^-4$string x} each til x;

c1:“f”$til x;

c2:“f”$til x;

c3:“f”$til x;

c4:“f”$til x;

c5:“j”$til x;

c6:“i”$til x;

c7:“e”$til x;

c8:“e”$til x;

c9:“j”$til x;

c10:“f”$til x;

c11:“f”$til x;

c12:“f”$til x;

c13:“f”$til x;

c14:“j”$til x;

c15:“f”$til x;

c16:“f”$til x;

c17:“f”$til x);

update p#sym from sym xasc data};

writeDay:{

x:update p#sym from sym xasc x;

tabDir: sv dbDir, ($string x[`date][0]), `tab, `;

show tabDir;

tabDir set .Q.en[dbDir] x;};

writeDay each genDay[10000;] each 2015.01.01+til 10;

system "l ",1_string dbDir;

\ts select

avg c1, avg c2, avg c3, avg c4, avg c5,

avg c6, avg c7, avg c8, avg c9, avg c10,

avg c11, avg c12, avg c13, avg c14, avg c15,

avg c16, avg c17

by sym

from tab

where (date in 2015.01.01+til 10),(sym in 5000#sym)

/ 870j, 21835264j

LamHinYan · September 15, 2015, 9:39am

An O(n^2) algorithm on 100k rows will probably take 3 minutes.

Topic		Views
slow aggregate multiple columns - 10x slower than J Community Support kdb-and-q	7	March 27, 2015
Question about the performance difference of two queries on HDB Community Support kdb-and-q	7	January 28, 2016
kdb+ intro question Community Support kdb-and-q	2	June 28, 2014
historic data storage and retrieval Community Support kdb-and-q	7	October 13, 2013
Re: [personal kdb+] Read and save big csv file speed Community Support kdb-and-q	11	April 11, 2013

slow performance of win32 version of KDB

Related topics