Timing/Memory surprise on 3 versions of <Conditional function applied to lists>

JP14 · October 12, 2021, 4:55am

Hi,

A bit puzzled here… Say I have a table comprised of 2 columns, with the first one made of variable length lists, and the second showing a binary:

<font face='"terminal,monaco"'>t:([] a:{x?50} each 2+100000?100; b:100000?0b)</font>

I want to apply a specific function to each list found in ta, depending on the binary tb. In this example, let’s simply use <first> and <last> as dependant functions.

Below are 3 possible syntax versions and their time/memory consumption. I would have expected the <scan> version to beat the 2 others by a substantial factor, at least on timing (with +/- impact on memory). The reality seems to be the opposite:

Cond$ with Each

<font face='"terminal,monaco"'>q)\ts {$[x`b; last x`a; first x`a]} each t </font>
<font face='"terminal,monaco"'>40 3697968 </font>

Cond? with Each within function

<font face='"terminal,monaco"'>q)\ts {?[x`b; last each x`a; first each x`a]}t </font>
<font face='"terminal,monaco"'>16 4746672 </font>

Scan

<font face='"terminal,monaco"'>q)\ts ({?[y`b; last; first] y`a}\)[::;t] </font>
<font face='"terminal,monaco"'>62 12746416 </font>

Is there an issue with my wording of the scan version (4x slower, 2.5x memory hungry), or is the second one the actual optimal solution? As usual, your insight is highly appreciated. Thx.

rocuinneagain1 · October 12, 2021, 10:57am

This is not really a suitable application of scan. Scan is an accumulator where it is useful when the calculation of a subsequent calculation depends of the result of the previous calculation. This has an overhead as the calculation is computed item by item and each result must be passed back in to the next calculation. As you never use the variable ‘x’ inside the scan it is an indication it is not the best use-case. This blog has some visualisations which aim to show how scan functions internally.

Your second version is fastest as it operates on ‘x`b’ as a vector rather than inside ‘each’.

One other possible variation is shown below:

q)\ts {?[x`b; last each x`a; first each x`a]}t 16 4746672 q)\ts {((first;last) x`b)@'x`a}t 7 4746640

It’s goal is to avoid calculating ‘last each’ and ‘first each’ for every row.

Instead it uses each both (') to apply first or last after it is known which function is needed.

JP14 · October 12, 2021, 4:04pm

Thank you for both the explanation and the more efficient <each both> alternative… Best.

Topic		Replies	Views
Timing/Memory surprise on 3 versions of <Conditional function applied to lists> Community Support imported , kdb-and-q	1	12	October 12, 2021
Help understanding scan adverb Community Support imported , kdb-and-q	1	15	June 18, 2024
understanding scan operator Community Support kdb-and-q	6	9	December 26, 2016
sliding window Community Support kdb-and-q	8	10	February 10, 2011
Manipulating the elements in the table Community Support kdb-and-q	2	8	February 27, 2016

Timing/Memory surprise on 3 versions of &lt;Conditional function applied to lists&gt;

Related topics

Timing/Memory surprise on 3 versions of <Conditional function applied to lists>