The World Is a Knapsack Problem and We're Just Living in It
Understanding one of computer science's most classic problems.
In computer science, there are problems that taunt and mock you: the brain mutilating vortex of a difficult recurrence, a sprawling parallel algorithm, anything that almost but doesn't quite work out, generally. The salt in the wound that is a nuked brain is often that the problem is sub-optimally easy to solve. That is, an answer exists right there in front of you and it will quickly and simply return a correct answer—but not the most correct answer. Another algorithm can do better, and that algorithm is a hellscape.
To see this, and to see why it's interesting and highly relevant, we need to understand one of the foundational problems of computer science: the knapsack problem. It's easy enough to explain and the odds are pretty good that you've encountered it in everyday life. It goes like this. You're given some container (a knapsack, but really whatever) and it has a certain capacity. The goal is to fill the knapsack with a set of objects with the greatest total value while still adhering to the capacity limits of the knapsack.
How can you put the most valuable set of objects into the pack without it being too heavy? It's easy enough to sort out something like the situation below just by thinking through it, but the problem very quickly becomes complex as more items are added and as the container's capacity increases. In fact, it becomes impossibly complex—hopeless. The problem is what's known as NP-hard, which means that it is not the case than an algorithm exists that can solve it both correctly and quickly on all possible cases.
(A "fast" problem can be completed, in the worst possible case, in polynomial time, which means that given some number of inputs n, it won't take more time than nk, where k is any constant number.)
The knapsack problem is more generally considered as a resource-allocation problem and this is where we see its application pretty much everywhere in the world. How do you spend your paycheck to get the most value (or utility) for your earnings? What is the fewest number of airplanes that can be used to service some set number of routes? How many people of different talents and corresponding salaries can your department hire to do the most work at the least cost? This is like the very fabric of organized existence.
The algorithms you use to solve this problem are probably crappy—best guesses, right? And best guesses will work pretty good most of the time (or at least hide their failures well enough), but, otherwise, there's a legion of computer science grads out there waiting to optimize the fuck out of your knapsack, or at least come pretty close given finite computing resources.
This is where we get to the algorithmic taunt I mentioned above. The knapsack problem is easy, even really easy, using algorithms known as greedy algorithms. "Their name essentially gives their description: do the thing that looks best right now, and repeat until nothing looks good anymore or you're forced to stop," explains Jeremy Kun, a mathematics PhD student at the University of Illinois, on his (excellent) blog Math ∩ Programming.
Greed isn't good, but it's usually OK.
And that's really what it is: choose like you only get one choice. In the knapsack problem illustrated above, our first choice would be to take the object with the best cost-to-weight ratio. Our second choice would be the next best and so on until the knapsack has reached its limit.
A variation of this is the coin change problem, which is where we're supposed to come up with a given value, some number of cents, using a set of coins of different denominations. The goal is to use as few coins as possible. So, given the US money system, if I were to come up with, say, $.90, I would start with the next highest denomination, a 50-cent piece. With that coin on the table, now we have to make $.40, so drop a quarter. $.15, a dime. And so on.
This is pretty slick, I think. And, moreover, it would seem to be correct. The coins I selected above are indeed the optimal solution, the fewest number of coins possible to make $.90 (again, given US currency). It's also very fast, with the worst case topping out at n operations, where n is the value to be constructed using coins. (So, for $1.00, the worst case would take about 100 operations.)
But here's the catch, the cruel taunt I told you about before. The greedy algorithm doesn't given an optimal solution, not always. For the coin problem, it does only given what's called a "canonical coin system," which mostly just means that the coin system consists of a limited set of denominations, as in any IRL currency. This feature doesn't generalize, unfortunately, and given an arbitrary set of coin values (where the number of coin denominations is large and unknown) we're back where we started: with NP-hard complexity.
Fortunately, we don't always need the very most optimal solution to the knapsack problem/coin problem. In this case, the greedy algorithm acts as an approximation algorithm, which is just what it sounds like: a good enough solution. And you already know this intuitively because you come up with good enough solutions all of the time, probably based on informal greedy logic.
Greed isn't good, but it's usually OK.