DP: Subset Sum and Partition Problems

Finding the right combination of items to meet a precise total is a challenge that appears everywhere, from financial portfolio optimization and resource allocation to data packet scheduling. At its core, this is the subset sum problem, a classic puzzle in computer science and operations research. While inherently difficult, it can be tackled efficiently for practical cases using a powerful technique: dynamic programming (DP). Understanding how to solve subset sum with DP not only provides a tool for many applications but also offers a profound lesson in the nature of algorithmic complexity, bridging the gap between intractable problems and practical solutions.

Defining the Core Problems

Let's formalize the two interconnected problems at the heart of this discussion. You are given a set of $n$ integers, typically represented as an array nums, and a target integer T.

The subset sum problem asks a deceptively simple question: Does there exist a subset of nums whose elements sum to exactly the target T? The answer is purely Boolean—yes or no. For example, given nums = [3, 34, 4, 12, 5, 2] and T = 9, the answer is "yes" because the subset [4, 5] sums to 9.

A special and very practical case of subset sum is the partition problem. Here, you ask: Can the set nums be partitioned into two subsets such that the sums of the two subsets are equal? This is equivalent to asking if there exists a subset that sums to exactly half of the total sum of all elements. If the total sum $S$ is odd, the answer is immediately "no," as you cannot split an odd integer into two equal halves. For nums = [1, 5, 11, 5], the total sum is 22. Half is 11, and since the subset [11] exists, the answer is "yes," with the partition being [11] and [1, 5, 5].

Both problems are classified as NP-complete. This means that in the worst case, the time required to solve them grows exponentially with the number of items $n$ , and no known algorithm can solve all instances in polynomial time (like $O (n^{2})$ or $O (n^{3})$ ). However, this doesn't mean they are always unsolvable. Their complexity is sensitive to the magnitude of the numbers involved, not just the count, which leads us to a powerful pseudo-polynomial solution.

The Dynamic Programming Solution for Subset Sum

The breakthrough in solving subset sum for many real-world inputs comes from dynamic programming. Instead of trying all $2^{n}$ possible subsets, we build a table that systematically answers smaller, overlapping subproblems. The classic approach uses a Boolean DP table.

Define a 2D table dp with dimensions $(n + 1) \times (T + 1)$ . The cell dp[i][j] will represent the answer to the question: "Using only the first i elements of the array (from index 0 to i-1), is it possible to form a subset that sums exactly to j?"

We initialize the table as follows:

dp[0][0] = true: With zero elements, you can only achieve a sum of zero.
dp[0][j] = false for all $j > 0$ : With zero elements, you cannot achieve any positive sum.

The recurrence relation that fills the table is intuitive: $d p [i] [j] = d p [i - 1] [j] OR d p [i - 1] [j - n u m s [i - 1]]$ provided $j - n u m s [i - 1] \geq 0$ .

Let's break this down. For the i-th element (value = nums[i-1]) and a target sum j, you have two choices:

Exclude the element: The sum j must be achievable using only the previous i-1 elements. This is represented by dp[i-1][j].
Include the element: If you include nums[i-1], then the remaining sum j - nums[i-1] must be achievable using the previous i-1 elements. This is represented by dp[i-1][j - nums[i-1]].

If either path is true, then dp[i][j] is true. The final answer to the subset sum problem is found in the cell dp[n][T].

The time and space complexity of this algorithm is $O (n \times T)$ . This is called pseudo-polynomial time because it is polynomial in the value of T, not just the number of inputs $n$ . If T is very large (e.g., exponential in $n$ ), this algorithm becomes slow. However, for many practical cases where the target and numbers are reasonably bounded, it is extremely efficient.

Reconstructing the Subset

The Boolean DP table tells you if a subset exists, but often you need to know which elements form that subset. You can reconstruct the subset by tracing back through the filled DP table.

Starting from the final cell dp[n][T], you work backwards:

If dp[i][j] is true because dp[i-1][j] was true (the exclude case), you simply move up to row i-1 with the same sum j. The current element nums[i-1] is not part of the solution.
If dp[i][j] is true because dp[i-1][j - nums[i-1]] was true (the include case), you add nums[i-1] to your solution subset. Then, you move up to row i-1 and set the new target sum to j - nums[i-1].
You repeat this process until you reach dp[0][0].

This backward walk is guaranteed to find one valid subset because it follows the logical decisions that made the final answer true. It runs in $O (n)$ additional time.

Solving the Partition Problem via Subset Sum

Given your understanding of the subset sum DP solution, the partition problem becomes a straightforward application. As noted, partitioning an array into two equal-sum subsets is equivalent to finding a subset that sums to total_sum / 2.

Therefore, the algorithm is:

Calculate the total sum $S$ of all elements in nums.
If $S$ is odd, return false immediately.
Otherwise, set the target T = S / 2.
Run the exact subset sum DP algorithm described above to check if any subset sums to T.

The time complexity for solving partition this way is $O (n \times S)$ , which is pseudo-polynomial relative to the total sum $S$ . This is often the most efficient correct method for this NP-complete problem on typical inputs.

Common Pitfalls

Misindexing the DP Table: A frequent off-by-one error arises from confusing the element index i in the DP table with the array index. Remember, dp[i][j] considers the first i elements, which correspond to nums[0] through nums[i-1]. Always be meticulous when writing the recurrence relation dp[i][j] = dp[i-1][j] OR dp[i-1][j - nums[i-1]].

Confusing Pseudo-Polynomial with Polynomial: It's crucial to understand why $O (n T)$ is pseudo-polynomial. The "size" of input T is its logarithm ( $lo g T$ bits). An algorithm that is polynomial in the value of T (like $O (T)$ or $O (n T)$ ) is actually exponential in the input size ( $lo g T$ ). Do not mistake this for a true polynomial-time algorithm like $O (n^{2})$ , which would work quickly even if T were astronomically large.

Incorrect Subset Reconstruction: When tracing back to find the subset, a common mistake is to check conditions incorrectly. You must check dp[i-1][j] first. Only if it is false do you conclusively know the inclusion path was taken. If both predecessor cells are true, either path is valid, and your choice will determine which subset you reconstruct. The algorithm must follow the exact logic used during table population.

Summary

The subset sum problem (checking for a subset that sums to a target T) and the partition problem (checking for an equal-sum split) are foundational NP-complete problems with wide practical relevance.
A dynamic programming approach using a Boolean table provides a pseudo-polynomial time solution in $O (n T)$ or $O (n S)$ , which is efficient for many practical instances where the target sum is not excessively large.
The DP table dp[i][j] stores whether a sum j is achievable using the first i elements, built using a recurrence relation based on including or excluding the current element.
You can reconstruct the actual subset by tracing backward from the final DP table cell, following the logical inclusion/exclusion decisions that led to a true result.
The partition problem is solved by checking if a subset sums to half of the total sum, directly leveraging the subset sum DP algorithm as a subroutine.

DP: Subset Sum and Partition Problems

DP: Subset Sum and Partition Problems

Defining the Core Problems

The Dynamic Programming Solution for Subset Sum

Reconstructing the Subset

Solving the Partition Problem via Subset Sum

Common Pitfalls

Summary

Write better notes with AI