Multi-threaded decompression

Victor_Wong1 · April 9, 2016, 4:18am

Hi,

I am wondering why multi-threaded decompression (using peach) appears to be slower than using a single thread. Is there some sort of IPC between threads that’s causing extra overhead? What is the right way to decompress in parallel?

Thanks in advance,

Victor

c:\q\w32>q -s 3

Welcome to kdb+ 32bit edition

For support please see http://groups.google.com/d/forum/personal-kdbplus

Tutorials can be found at http://code.kx.com/wiki/Tutorials

To exit, type \

To remove this startup msg, edit q.q

q).z.zd:(16;2;6)

q)(@[`:c:/db/t;;:;].‘) ((`sym;10000000?($[`]’)“abcde”);(`time;.z.Z - 10000000?1000f);(`price;100 - (10000000?20f) - 10))

:c:/db/t:c:/db/t`:c:/db/t

q)\t @[:c:/db/t;] each timesymprice

468

q)\t @[:c:/db/t;] peach timesymprice

982

Victor_Wong1 · April 10, 2016, 10:20am

I realize the motivation may not be clear from the example above, so I put together some use cases below. From testing, it looks like select doesn’t do compressed column scans - for filtering or retrieval - in parallel, and one can outperform select by using peach. However, it doesn’t generalize when one tries to parallelize both. I imagine there is probably some setting/library I am not aware of to optimize queries on compressed tables given how long compression has been part of kdb, so if anyone has any suggestions, I’d really appreciate it.

c:\q\w32>q -s 3KDB+ 3.3 2016.03.14 Copyright (C) 1993-2016 Kx Systemsw32/ 4()core 4095MB NONEXPIREq).z.zd:(16;2;6)q):c:/db/t/ set .Q.en[:c:/db/] ([]sym:10000000?($[]')“abcde”;time:.z.Z - 10000000?1000f;price:100 - (10000000?20f) - 10):c:/db2/t/q)\l c:/db

Retrieving columns in parallel using peach outperforms simple select.

q)\t select from t where sym in abc1560q)\t {[t;s] flip c!{x[z] y}[t;exec i from t where sym in s] peach c:cols t}[t;abc]1170

Filtering rows across multiple columns using peach also outperforms standard select.

q)\t exec i from t where (sym in abc) and (time > 2015.01.01) or price > 1001669q)\t {x inter y union z} . {eval parse"exec i from t where ",x} peach ("sym in abc";"time > 2015.01.01";"price > 100")1357

However, it is actually slower when you try to do both in parallel.

q)\t select from t where (sym in abc) and (time > 2015.01.01) or price > 1001747q)\t {[t;s] flip c!{x[z] y}[t;s] peach c:cols t}[t] {x inter y union z} . {eval parse"exec i from t where ",x} peach ("sym in abc";"time > 2015.01.01";"price > 100")2090

charlie1 · April 11, 2016, 8:20am

kdb+ uses thread local heaps, and uses serialization when passing data back from slave threads to the main thread during peach. In general, the problems suitable for peach are those that incur a high computation cost returning small data.

kdb+ has built in support for using multiple threads for the … in queries such as

select … by s from t where s in S

where s has a p or g attr. In addition it multithreads queries across partitions.

If you’re not doing any aggregation, or you don’t have a g or p attr on s, then it’s possible you’ll find manually crafted explicit routes which perform better.

Victor_Wong1 · April 18, 2016, 2:21am

Thanks, Charles. So if I understand you correctly, it uses multiple threads when the query spans multiple partitions, or performs aggregations over partitioned or indexed columns, but does not differentiate between compressed and uncompressed columns?

Topic		Views
puzzled about 'peach'... Community Support kdb-and-q	1	May 6, 2011
v3.2 32bit released Community Support kdb-and-q	2	October 3, 2014
RE: [personal kdb+] peach type error Community Support imported , kdb-and-q	1	August 31, 2015
.Q.fc vs peach Community Support kdb-and-q	6	January 27, 2018
Compression Erorr Community Support kdb-and-q	4	January 13, 2015

Multi-threaded decompression

Related topics