Strange discrepancy in performance [contains AOC 1 spoiler]

Hello all!

Ran into a strange performance difference between the following variants of the same code when trying to code golf [the first AOC challenge](“https://community.kx.com/t5/Advent-of-Code-2021/AOC-Day-1-Sonar-Sweep/gpm-p/11352/highlight/true#M2” "“Laura’s”) today:

 

d:100+10000000?1000 \t -1+sum 0>‘:d // ~50ms \t -1+(+/)0>’:d // ~50ms \t -1+/ 0>‘:d // ~1050ms (20 times slower?!) \t -1+0+/ 0>’:d // ~1050ms

 

The only thing I can think of is that for some reason the boolean list is being coerced to long when I supply a seed to plus over, but I’m not sure why this would ever be the case. Surely the accumulator under the hood is a long in either case, so I’m not sure why the seed wouldn’t just set an initial value for this, rather than cast the entire righthand list. It’s also stated in [the documentation](“https://code.kx.com/q/ref/accumulators/#over” "“Accumulators”) that:

  • “If the value is a  binary function with a known identity element  I, and the derived function is applied as a unary, the result is I.” 
  • “0 is I for +”

Which make me even more curious as to why 0+/L acts any different from (+/)L

I’d love to know more about what’s happening under the hood here as initially I thought sum was just implemented in pure c & some optimisation step was converting (+/) to sum, but this discrepancy disappears when not acting on a boolean list.  

You’ll see similar behavior with other lists (not just booleans) when you supply an initial value that is of a different type to the items of the list e.g.

q)bi`j set’5000000#'(1b;1i;1); q)\t 0i+/b 850 q)\t 0i+/i 2 q)\t 0i+/j 122 q)\t 0+/b 877 q)\t 0+/i 879 q)\t 0+/j 4

Rather than a type change at each intermediate result - better to do at the start

q)\t 0+/“j”$b 13

 

Sorry, should’ve been more clear & said not acting on a list of different type. My question is why would the interpreter do that cast at each intermediate under the hood? Where should I go for documentation on the inner workings?

Because of implicit type promotion - https://code.kx.com/q4m3/4_Operators/#44-basic-arithmetic-

So something like  0+/111b has the below intermediate operations

  • 0 + 1b = 1 (type promotion to long)
  • 1 + 1b = 2 (type promotion to long)
  • 2 + 1b = 3 (type promotion to long)
  • return 3

Maybe it’s better illustrated by the below

q)ij set’5000000#'(0i;0); q)\t 0i+'i 863 q)\t 0i+'j 1233 q)\t 0+'j 200 q)\t 0+'i 986

However, in an atomic operation, the performance difference is barely noticeable

q)\t 0i+i 4 q)\t 0i+j 8

 

I’m not sure that’s exactly what’s causing the huge slowdown, since 0b+1b=1i I’d expect type conversation for every intermediate operation after that in the implicit seed case. Since (+/) returns an int & booleans are, well, boolean. Why aren’t the following equivalent in terms of performance:

\t (+/)0>‘:d // 50 \t (+/)6h$0>’:d // 70 (surely should be faster to cast all in one go?) \t (+/)6h$‘0>’:d // 800 (definately slower to explicitly cast each) \t 0i+/0>':d // 1000 (why?!)

What type is the explicit zero in 1 that lets it be so fast and even if it’s something special, after the first addition, isn’t it still working with an int accumulator?

And on the latter, since I’ve supplied a seed with the type that the first solution why is it still so bad? 

sum (+/) has special handling if/when type promotion is necessary
x+/y will go to the general case when there is a type mismatch between x and items of y

this may be fixed in future version