https://leetcode.com/problems/maximum-subarray/
Given an integer array
nums
, find the subarray which has the largest sum and return its sum.
The performance of q solutions to the [Maximum Subarray problem](“https://en.wikipedia.org/wiki/Maximum_subarray_problem” ““Wikipedia””) appeared on StackOverflow recently.
[In q/kdb+, can I speed up Kadane’s algorithm for the maximum subarray problem?](“https://stackoverflow.com/questions/74508773/” ““StackOverflow””)
Brute force
A brute-force solution is unlikely to be efficient but will be useful for testing faster candidates.
Ours generates indices for all the subarrays, calculates the corresponding sums and picks the largest.
q)x:-21-34-121-54q)1+tcx:tilcountx123456789q)tcx+neg[tcx]_\:tileach1+tcx:tilcountx (,0;01;012;0123;01234;012345;0123456;01234567;012345678) (,1;12;123;1234;12345;123456;1234567;12345678) (,2;23;234;2345;23456;234567;2345678) (,3;34;345;3456;34567;345678) (,4;45;456;4567;45678) (,5;56;567;5678) (,6;67;678) (,7;78),,8q)razextcx+neg[tcx]_\:tileach1+tcx:tilcountx,-2-21-21-3-21-34-21-34-1-21-34-12-21-34-121-21-34-121-5-21-34-121-54,11-31-341-34-11-34-121-34-1211-34-121-51-34-121-54,-3-34-34-1-34-12-34-121..q)mcs0:{maxsumeachrazextcx+neg[tcx]_\:tileach1+tcx:tilcountx} q)mcs0x6
Kadanes algorithm
The OP on StackOverflow offered ((|[0])(+)::)\[0f;x]
as an implementation of Kadanes algorithm and wondered if it could run as fast on a 9M list as Python+numba does. For this he would need at least a 4× improvement.
Your eye will immediately be drawn to the 0f
. Whats a float doing in this? And youre right. The expression runs 2× faster with the seed set to integer 0; and fractionally faster in a simpler form.
q)M9:-10+9000000?20q)\tsmax((|[0]) (+)::)\[0f;M9]4367412436160q)\tsmax((|[0]) (+)::)\[0;M9]2425412436160q)\tsmax0(0|+)\M92181412436096q)mcs1:max((0|+)\)@ / Kadane
Pause here to notice the very terse composition of 0|
and +
, equivalent to but slightly faster than the lambda {0|x+y}
.
q)\ts0(0|+)\M92316412436000q)\ts0{0|x+y}\M92051412436096
This implementation of Kadane fails on an edge case: when all the numbers are negative. Like a sundial that marks only the sunny hours, anytime the max-so-far falls below zero, it gets set to 0. If no number is positive the result is wrongly given as zero.
q)0(0|+)\neg1+til500000
A more exact implementation of Kadane would track not only the max-since-last-zero, but also the max-seen-so-far in case all the numbers are negative. But of course that is nothing but the max
of the numbers. If the expression above returns all 0s, then there are no positive numbers, and we can safely return the maximum instead.
q)mcs1a:{$[r:max0(0|+)\x;r;maxx]} q)mcs1aeach(M9;neg1+til5)354-1
This will do nicely if all-negative is a rare edge case. But what if it were often so?
q)M9n:neg1+absM9q)\tsmcs1aM92224412435728q)\tsmcs1aM9n2561412435728
We could handle this by modifying the scan to track both the max-since-last zero and also the max-number-seen. At the end, if the latter is negative, thats the result we want.
q)mcs2:{{(y;z)z<0}.(0,x00) {(0|y+x0;x[1]|0|y+x0; y|x2)}/x} q)\tsmcs2M97793592q)\tsmcs2M9n7865592
Now the two arguments require the same time and space, but execution takes 3× more time than mcs1a
over 600× less space.
Vector primitives and overcomputing
The implicit iteration of the vector primitives is so fast that it is often more efficient to overcompute than to parse and execute an explicit iteration. Consider the simple case of determining the range of a vector.
q)\ts:10(minM9;maxM9)83960q)\ts:10(min;max)@\:M9821024q)\ts:1000{(y&x0;y|x1)}/M9441582192
Vector vector vector
Finally we abandon Kadanes algorithm for a vector solution from Nathan Swann.
q)mcs3:{maxs-mins0^prevs:sumsx} q)mcs3neg1+til5 / handles edge case-1q)\tsmcs1M92209412435616q)\tsmcs3M984536871104
Here all the iteration is implicit and we see a 50× speed-up from the OPs solution, well more than the 4× improvement needed to beat Python+Numba.
Conclusions
- A little tweaking won 2× on the OP expression.
- Switching from two primitive aggregations through a 9M list to a single explicit iteration increased execution time 3×. (But massively saved memory.)
- Deserting Kadane and divide-and-conquer algorithms for a pure vector solution won a 50× improvement, significantly beating the reported execution by Python+Numba.
Archived at https://github.com/qbists/studyq