LeetCode #53: Maximum Subarray

SJT1 · November 22, 2022, 7:42pm

https://leetcode.com/problems/maximum-subarray/

Given an integer array nums, find the subarray which has the largest sum and return its sum.

The performance of q solutions to the [Maximum Subarray problem]("Maximum subarray problem - Wikipedia" ""Wikipedia"") appeared on StackOverflow recently.

[In q/kdb+, can I speed up Kadane's algorithm for the maximum subarray problem?]("performance - In q/kdb+, can I speed up Kadane's algorithm for the maximum subarray problem? - Stack Overflow" ""StackOverflow"")

Brute force

A brute-force solution is unlikely to be efficient but will be useful for testing faster candidates.

Ours generates indices for all the subarrays, calculates the corresponding sums and picks the largest.

q)x:-21-34-121-54q)1+tcx:tilcountx123456789q)tcx+neg[tcx]_\:tileach1+tcx:tilcountx (,0;01;012;0123;01234;012345;0123456;01234567;012345678) (,1;12;123;1234;12345;123456;1234567;12345678) (,2;23;234;2345;23456;234567;2345678) (,3;34;345;3456;34567;345678) (,4;45;456;4567;45678) (,5;56;567;5678) (,6;67;678) (,7;78),,8q)razextcx+neg[tcx]_\:tileach1+tcx:tilcountx,-2-21-21-3-21-34-21-34-1-21-34-12-21-34-121-21-34-121-5-21-34-121-54,11-31-341-34-11-34-121-34-1211-34-121-51-34-121-54,-3-34-34-1-34-12-34-121..q)mcs0:{maxsumeachrazextcx+neg[tcx]_\:tileach1+tcx:tilcountx} q)mcs0x6

Kadanes algorithm

The OP on StackOverflow offered ((|[0])(+)::)\[0f;x] as an implementation of Kadanes algorithm and wondered if it could run as fast on a 9M list as Python+numba does. For this he would need at least a 4× improvement.

Your eye will immediately be drawn to the 0f. Whats a float doing in this? And youre right. The expression runs 2× faster with the seed set to integer 0; and fractionally faster in a simpler form.

q)M9:-10+9000000?20q)\tsmax((|[0]) (+)::)\[0f;M9]4367412436160q)\tsmax((|[0]) (+)::)\[0;M9]2425412436160q)\tsmax0(0|+)\M92181412436096q)mcs1:max((0|+)\)@ / Kadane

Pause here to notice the very terse composition of 0| and +, equivalent to but slightly faster than the lambda {0|x+y}.

q)\ts0(0|+)\M92316412436000q)\ts0{0|x+y}\M92051412436096

This implementation of Kadane fails on an edge case: when all the numbers are negative. Like a sundial that marks only the sunny hours, anytime the max-so-far falls below zero, it gets set to 0. If no number is positive the result is wrongly given as zero.

q)0(0|+)\neg1+til500000

A more exact implementation of Kadane would track not only the max-since-last-zero, but also the max-seen-so-far in case all the numbers are negative. But of course that is nothing but the max of the numbers. If the expression above returns all 0s, then there are no positive numbers, and we can safely return the maximum instead.

q)mcs1a:{$[r:max0(0|+)\x;r;maxx]} q)mcs1aeach(M9;neg1+til5)354-1

This will do nicely if all-negative is a rare edge case. But what if it were often so?

q)M9n:neg1+absM9q)\tsmcs1aM92224412435728q)\tsmcs1aM9n2561412435728

We could handle this by modifying the scan to track both the max-since-last zero and also the max-number-seen. At the end, if the latter is negative, thats the result we want.

q)mcs2:{{(y;z)z<0}.(0,x00) {(0|y+x0;x[1]|0|y+x0; y|x2)}/x} q)\tsmcs2M97793592q)\tsmcs2M9n7865592

Now the two arguments require the same time and space, but execution takes 3× more time than mcs1a over 600× less space.

Vector primitives and overcomputing

The implicit iteration of the vector primitives is so fast that it is often more efficient to overcompute than to parse and execute an explicit iteration. Consider the simple case of determining the range of a vector.

q)\ts:10(minM9;maxM9)83960q)\ts:10(min;max)@\:M9821024q)\ts:1000{(y&x0;y|x1)}/M9441582192

Vector vector vector

Finally we abandon Kadanes algorithm for a vector solution from Nathan Swann.

q)mcs3:{maxs-mins0^prevs:sumsx} q)mcs3neg1+til5 / handles edge case-1q)\tsmcs1M92209412435616q)\tsmcs3M984536871104

Here all the iteration is implicit and we see a 50× speed-up from the OPs solution, well more than the 4× improvement needed to beat Python+Numba.

Conclusions

A little tweaking won 2× on the OP expression.
Switching from two primitive aggregations through a 9M list to a single explicit iteration increased execution time 3×. (But massively saved memory.)
Deserting Kadane and divide-and-conquer algorithms for a pure vector solution won a 50× improvement, significantly beating the reported execution by Python+Numba.

Archived at https://github.com/qbists/studyq

LeahS1 · November 23, 2022, 10:23am

Super content @SJT

Thanks for sharing with the KX Community!

NathanSwann1 · November 23, 2022, 10:32am

It looks like mcs3 is only 25x faster than mcs1, still a massive improvement though. For those curious the vector solution itself is a vectorization of Ulf Frenanders’ own solution to the problem from back in 1977.

SJT1 · November 23, 2022, 11:25am

Thats right the other 2× was low-hanging fruit from using an int zero!

gabiteodoru1 · November 23, 2022, 11:49am

Hey Nathan! Thanks again for your brilliant solution to my Q problem! I’ve been wondering ever since how you came up with such a creative solution – it strikes me as needing some background knowledge in a field of math I’m unfamiliar with yet – an algebra of running sums, mins, maxes, and compositions of such functions (reminds me of stuff like the Legendre transform and the convex duals). Could I ask you what your background is, and what this branch of mathematics is called? I’d love to read up more on it. I was able to prove that the two algorithms are equivalent, but I did so at the scalar level; I’d love to come up with a proof that does it at the series level, using something similar to how vectors / matrices are used for linear algebra and multivariate calculus; or how lag operators are used in time series; do you know of any such proofs? Many thanks!

NathanSwann1 · November 23, 2022, 1:40pm

Background wise I am pure mathematics mostly studied Class/Set Theory along with Abstract algebra but I have a fair amount of time spent on programming languages akin to Q. I’m not familiar with any particular field that covers these sorts of operations specifically and I have no particular proof that both algorithms are equivalent.

Instead I solved it from the start again using the observation that the sum of a given range i to j is the same as the sum up to j minus the sum up to i. The proof of validity of the algorithm should be fairly direct from there but is should look like a classical series proof that you would see in Set Theory or in the fundamental proofs for calculus.

Hope that helps

Topic		Replies	Views
mmax performance compared to native implementation (orders of Community Support kdb-and-q	6	35	July 1, 2019
max partitions Community Support kdb-and-q	4	15	January 3, 2018
Re: [personal kdb+] maxima and minima Community Support kdb-and-q	3	10	January 27, 2013
performant maximum drawdown calculation Community Support kdb-and-q	2	22	March 22, 2012
Find the max value without using keyword. Community Support	3	25	February 28, 2022