On tackling NP-hard problems

Notes by Richard O'Keefe

There are NP-hard problems which are of such practical importance that we can't just ignore them and hope they will go away.

In such a case, we have some alternatives:

  1. Solve a related problem instead (TeX page breaking)
  2. Use an approximate method which gets good but not optimal results
  3. Use a probabilistic method which may or may not hit an optimum
  4. Use an exact method which has O(2n) cost in the worst case, but seems to work well in practice. Presumably the problems we actually try to solve are constrained in some way that we don't yet know how to characterise which means that they are not that hard.

Next lecture we shall look at clustering, which doesn't even try to find the most parsimonious tree.

Here are several examples of putting up with a bad worst case.

  1. People use the quick-sort algorithm despite the fact that it not only isn't particularly quick (the original article in Computer Journal 1961 made it clear that quick sort was less efficient than merge sort), it has O(n2) worst case cost. Your data structures and algorithms textbook almost certainly had an ingenious argument which purported to show that the probability of the worst case happening is so close to 0 that you can forget about it. Unfortunately, actual occurrences of the worst case cost have been reported sufficiently often that we know this theory is unrealistic. Quick-sort's worst case does occur often enough in practice that you probably should use merge sort. But it is rare enough that you can get away with quick sort most of the time.
  2. There is a technique of considerable practical importance called linear programming. A linear program is an optimisation problem
    minimize cost.vars
    subject to A.vars = 0
               vars all >= 0
    

    That is, minimize (or maximise) a linear combination of some variables, subject to a set of linear equalities and inequalities on those variables, where all the variables are non-negative. For several decades, people happily used a method called the Simplex Method to solve these problems. In effect, the Simplex Method chooses a subset of the variables to be the "basis" and solves the equations for that basis, and moves from basis to basis along the edges of a simplex. The fact that there are exponentially many subsets of a given size is why the method has, in principle, exponential worst case cost. But in practice, the method took polynomial time on real problems. We got away with it. In the mid-70s Karmakar published his famous algorithm showing that the problem had a polynomial time solution, and we now have reasonably efficient algorithms with good worst case cost. In practice, the Simplex Algorithm is still a good thing to try, and because Bell Labs patented Karmakar's method there was a period where you had to use an exponential cost method despite knowing a polynomial time one existed because you weren't allowed to use the faster one!

  3. The classical NP-complete problem is SAT. An atom is a propositional variable (it can be true or false). A literal is either an atom or an atom with a NOT in front of it. A clause is a set of literals: it is true if one of its atoms is true or one of its negated atoms is false. A SAT problem is a set of clauses, where we want an assignment of true/false to the variables which makes all of the clauses true at the same time.

    In general, the cost of finding a solution is O(2n). I once worked with van Gelder, who was in a team of students who managed to get it down to about O(1.02m), but it's still exponential time.

    In practice, people often use something called Binary Decision Diagrams. Basically, a BDD is just a graph where the leaves are false and true and the internal nodes are "if (some variable) then (left branch) else (right branch)". An Ordered BDD is one where no variable occurs more than once on a path to a leaf, and a Reduced ODBB is one where each subgraph is represented at most once. There are nice algorithms for making ROBDDs, and given the ROBDD for a Boolean formula you can just read off an assignment by walking up from the 'true' leaf. Any algorithm for solving Boolean problems must have exponential worst case, so ROBDDs must have exponential worst case. But they work well enough in practice to be used for problems with thousands of variables. (Provided that you are willing to put up with them occasionally bogging down.)

    Another technique is a randomised technique called GSAT. Repeatedly, pick a random assignment and hill-climb. Each hill climbing step finds out how many more clauses would be true after flipping each variable, and greedily flips the variable that makes the biggest improvement. If hill climbing finds a solution, stop with that solution. If hill climbing stops, unable to find any improvement, go back and start with another random assignment. This approach too works well enough in practice to be useful for problems with thousands of variables. (An improved version of GSAT is called WalkSAT. Look it up.)

  4. A variant of linear programming is called Integer Programming. The only change in Integer Programming is that the variables must take on integer values. Surprisingly, that makes it a much harder problem. One traditional technique for solving Integer Programming problems is branch-and-bound.

Back