Thanks for the suggestions, but I should have clarified two points earlier:
The actual calculation is path-dependent and non-trivial, so the solution should work for any arbitrary function of the current state and an element of the iterated vector.
I want to avoid updating globals because ultimately I want to be able to run this calculation many times in parallel with peach. And AFAIK, it’s not possible to use thread local globals directly from q.
I think you are not including the creation of the variable. when you make it a global. When you do your timings.
Making it hard to see what your really comparing and wanting to optimize.
Using global will reduce memory but dont think there is much difference in speed.