# The Subset Sum Problem

So the FireChat app has been gaining in popularity because of the Hong Kong protests. And today I decided to take a look at what it was all about. When I stumbled on its "We're Hiring" page, I saw them post a subset sum problem, which they'd like applicants to solve. It is as follows:

The 2010 Census puts populations of 26 largest US metro areas at 18897109, 12828837, 9461105, 6371773, 5965343, 5946800, 5582170, 5564635, 5268860, 4552402, 4335391, 4296250, 4224851, 4192887, 3439809, 3279833, 3095313, 2812896, 2783243, 2710489, 2543482, 2356285, 2226009, 2149127, 2142508, and 2134411.

Can you find a subset of these areas where a total of exactly 100,000,000 people live, assuming the census estimates are exactly right? Provide the answer and code or reasoning used.

It seemed interesting, and so I had a go at it.

I first added all the numbers together and got a figure of 129161818. Since the higher the number, the more iterations it'll take for the program to complete, I instead used 29161818, which is the total minus the goal. The results would give a list of numbers, which when removed from the original 26 numbers, would give the desired result.

I could easily have used 100000000 and it would still work, but it will take a while longer, often using more resources than is allowed, especially with online compilers.

And here is my finished product:

```
using System;
using System.Collections.Generic;
public class Test
{
private static int[] oriPop;
private static List<int> pop;
private static int goal;
private static List<int> subsets;
private static List<String> subsetsList;
static Test() {
goal = 29161818;
subsets = new List<int>();
subsetsList = new List<String>();
oriPop = new int[26] {18897109, 12828837, 9461105, 6371773, 5965343, 5946800, 5582170, 5564635, 5268860, 4552402, 4335391, 4296250, 4224851, 4192887, 3439809, 3279833, 3095313, 2812896, 2783243, 2710489, 2543482, 2356285, 2226009, 2149127, 2142508, 2134411};
pop = new List<int>();
foreach (int number in oriPop) {
if(number <= goal) {
pop.Add(number);
}
}
pop.Sort();
pop.Reverse();
}
private static int iterateSum(int index, int counter) {
if(index + counter < pop.Count) {
return (pop[index] + pop[index + counter]);
}
return Int32.MaxValue;
}
private static bool findMatch(int goal, int index) {
// If the number is greater than the goal, return false
if(pop[index] > goal) {
return false;
}
// If the number is the goal, then add the number to the subset and return true;
if(pop[index] == goal) {
subsets.Add(pop[index]);
return true;
}
// If the number is not the goal:
int sum = 0;
int counter = 1;
// Iterates through every number smaller than the current one
while (counter < (pop.Count - index))
{
sum = iterateSum(index, counter);
// If the two numbers add up to the goal, add the two numbers to subset and returns true
if (sum == goal) {
subsets.Add(pop[index]);
subsets.Add(pop[(index + counter)]);
return true;
}
// If the two numbers add up to something smaller than the goal
if (sum < goal) {
// Iterate through numbers smaller than the goal in the list
for(int i = index + 1; i < pop.Count; i++) {
// If that number is smaller than the difference, and
// if when iterating through downstream combinations, one comination matches, it will be added to the list
if(pop[i] <= (goal - sum) && findMatch((goal - sum), i)) {
subsets.Add(pop[index]);
subsets.Add(pop[(index + counter)]);
return true;
}
}
}
counter++;
};
return false;
}
public static void Main()
{
for(int i = 0; i < pop.Count; i++) {
findMatch(goal, i);
}
// At this point, all the matches are stored in subsets, but there could be more than one combination, But since related groups would be grouped together in the list (since they are added successively one after each other)
int sum = 0;
String tempString = String.Empty;
List<int> tempSubset = new List<int>();
// We can go through the subsets list and cumulatively add the items one after another until the goal is reached, and print those sets of numbers. And then process the remaining list
foreach(int number in subsets) {
tempSubset.Add(number);
sum += number;
if(sum == goal) {
tempSubset.ForEach(item => tempString += item.ToString() + ", ");
subsetsList.Add(tempString);
tempSubset.Clear();
tempString = String.Empty;
sum = 0;
}
}
subsetsList.ForEach(item => Console.WriteLine(item));
}
}
```

And the results gave:

```
2356285, 2149127, 2783243, 2710489, 4192887, 3439809, 5965343, 5564635,
2149127, 2142508, 3279833, 2710489, 4552402, 4192887, 5582170, 4552402,
2134411, 2142508, 2134411, 2149127, 2134411, 2356285, 2226009, 2812896, 2543482, 4335391, 4192887
```

I later did a search, and figured out I wasn't the only one to be interested in it. And since others have posted their answers, I might as well post mine.

Of note is this answer by Steven Shelby, written in Ruby.

```
#!/usr/bin/ruby
populations = [18897109, 12828837, 9461105, 6371773, 5965343, 5946800, 5582170, 5564635, 5268860, 4552402, 4335391, 4296250, 4224851, 4192887, 3439809, 3279833, 3095313, 2812896, 2783243, 2710489, 2543482, 2356285, 2226009, 2149127, 2142508, 2134411]
target = 100000000
# sum up the total of the elements of array
def total(arr)
arr.inject { |total, n| total + n }
end
def find_sol(arr, target)
if target > 0
arr.each do |el|
if el == target
return [el]
elsif el < target
sub_arr = arr[(arr.index(el)+1)..-1]
ans = find_sol(sub_arr, target - el)
if ans.length > 0
return [el] + ans
end
end
end
end
[]
end
# sort populations descending
answer = find_sol(populations.sort { |x,y| y <=> x }, target)
puts answer.to_s
puts "Total: " + total(answer).to_s
```

This is much shorter than my answer, and runs much faster. I am not familiar with Ruby, but looking at its structure, our answers probably used similar concepts.

Another answer which I thought was poor because it traverses through all possible combinations, while cutting off at 'non-promising' points. It claims the optimized program would take 2.5 minutes to run, which is much longer than necessary.

Here are some more links to answers by other programmers:

- jkjpub using Go
- Scott Ross at Tactile Media with a 'stupid' (quoting himself) answer
- Terry Judd with a similar approach
- Kay C Lan thinks he can do a little better than Terry and Scott, and used my optimization of finding out which cities add up to 29161818
- Mick Collins and Geoff Canyon-4 and their solution

If you'd like to read more about the problem, have a look at this wiki entry and this Code Golf challenge.