Interaction between peach and other optimisations

https://learninghub.kx.com/forums/topic/interaction-between-peach-and-other-optimisations

I understand there are various parallel optimisations that happen under the hood when running with some number of secondary threads, e.g. summing across multiple partitions. How do these interact with peach?

 

For example:

disk0/hdb/par.txt → disk1/hdb/partitions , disk2/hdb/partitions disk1/hdb/partitions/1-3-5 disk2/hdb/partitions/2-4-6

If I ran a query such as

select sum price by sym where int within (1;4)

and I had two secondary threads available, thread #1 would retrieve data from partitions 1, 3 on disk 1, and thread #2 would retrieve data from partitions 2, 4 on disk 2 to maximise I/O throughput.

 

But if my queries were wrapped in peach, would this still be possible, given peach would be using all available threads, e.g.

{x[]} peach ( {select sum price by sym where int within (1;4)}; {select sum price by sym where int within (5;6)} )

 

So are there situations when using peach can reduce performance? Thank you

The parallelism can only go one layer deep.

.i.ie These 2 statements end up executing the same path. In the first one the inner peach can only run like an each as it is already in a thread:

data:8#enlist til 1000000 ts {{neg x} peach x} peach data 553 1968 ts {{neg x} each x} peach data 551 1936

For queries map-reduce still will be used to reduce the memory load of your nested queries even if run inside a ``peach` even if not running the sub parts in parallel.

https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#1437-map-reduce

Where you choose to put your peach can be important and change the performance of your execution.

My example actually runs better without peach due to the overhead of passing data around versus neg being a simple operation

ts {{neg x} each x} each data 348 91498576

.Q.fc exists to help in these cases

ts {.Q.fc[{neg x};x]} each data 19 67110432

https://code.kx.com/q/ref/dotq/#fc-parallel-on-cut

And in fact since `neg` has native multithreading and operates on vectors and vectors of vectors it is best of off left on it's own:
ts neg each data 5 67109216 
ts neg data 5 67109104 
neg data
This example of course is extreme but does show that thought and optimisation can go in to each use-case on where to choose to iterate and place `peach``

I guess a more succint version of my question is “what happens to native parallelisations when running queries inside an instance of peach?”

Many thanks for the reply and examples.

 

"in fact since `neg` has native multithreading and operates on vectors and vectors of vectors it is best of off left on it's own"
This is what I was keen to understand, and it's useful to know that there are cases when you may be better off without peach.

kdb+ 4.1 has been released with some interesting improvements for peach which changes some of my answers as nesting is now supported

https://code.kx.com/q//releases/ChangesIn4.1/#peachparallel-processing-enhancements