OSdata tech blog
Does P = NP or P ≠ NP?
Does P = NP or P ≠ NP?
The most important unsolved problem in both mathematics and computer science is the question of whether P = NP or P ≠ NP.
P is polynomial and NP is non-deterministic polynomial (not nonpolynomial, as often mistakeningly believed.
There are numerous problems that take polynomial time to check the accuracy of the answer, but there is no known polynomial method for solving the problem in the first place. The question is whether this is an inherent asymetry or that we simply havent yet been clever enough to discover the solution.
When presented with an example, most people immediately think that there is an easy answer. The typical problems are easy to solve in the simple cases. The difficult to solve versions (which often occur commonly in industry and science) are not immediately obvious. And the pure mathematical answer regards a general purpose solution that works the same for all cases, including the worst possible case.
This blog entry will attempt to demonstrate why this actually is a very difficult question to answer.
Clay Institute Challenge
The Clay Mathematics Institute of M.I.T. is offering a million dollar reward to anyone who solves this P vs Np Problem. Note that the rules for the contest require that your answer be published in a well-recognized, peer-reviewed mathematics journal and that two years pass without your proof being discredited.
Here is their explanation of the problem:
Suppose that you are organizing housing accommodations for a group of four hundred university students. Space is limited and only one hundred of the students will receive places in the dormitory. To complicate matters, the Dean has provided you with a list of pairs of incompatible students, and requested that no pair from this list appear in your final choice. This is an example of what computer scientists call an NP-problem, since it is easy to check if a given choice of one hundred students proposed by a coworker is satisfactory (i.e., no pair taken from your coworkers list also appears on the list from the Deans office), however the task of generating such a list from scratch seems to be so hard as to be completely impractical. Indeed, the total number of ways of choosing one hundred students from the four hundred applicants is greater than the number of atoms in the known universe! Thus no future civilization could ever hope to build a supercomputer capable of solving the problem by brute force; that is, by checking every possible combination of 100 students. However, this apparent difficulty may only reflect the lack of ingenuity of your programmer. In fact, one of the outstanding problems in computer science is determining whether questions exist whose answer can be quickly checked, but which require an impossibly long time to solve by any direct procedure. Problems like the one listed above certainly seem to be of this kind, but so far no one has managed to prove that any of them really are so hard as they appear, i.e., that there really is no feasible way to generate an answer with the help of a computer. Stephen Cook and Leonid Levin formulated the P (i.e., easy to find) versus NP (i.e., easy to check) problem independently in 1971.
- Official Rules for the Millenium Problem
- Official Problem Description
- Lecture by Vijaya Ramachandran
The first thing to note is that the description is incredibly sloppy. The Clay Institute intended to make the problem accessible to the general public and therefore avoided the precision of mathematics on the asssumption that most of the public would easily get lost trying to read the mathematics and most of the public would find a mathematical presentation to be boring.
It is particularly unclear how many students per room. Many online discussions assume all students in a single 100-person room, 50 two-person rooms, or 25 four-person rooms. The number is not actually specified in the Clay discussion.
A journalist with the UK Telegraph incorrectly stated To help understand the issue, the Clay mathematical Institute gives an example in calculating how to accomodate 400 students in 100 university rooms.
Some people jump into non-mathematical solutions, such as offering counseling so that any grouping will become compatible. These solutions dont addres the underlying mathmatical question and really miss the whole point of the exercise. This is not a problem about housing students, this is a problem about making selections.
Many people immediately state that you simply pick a student at random, then try adding students until you have a compatible room, then repeat until you have all of the rooms filled.
This method will work fine in the typical real-world case where each student is likely to have a short list of dislikes. The tough problem is when the lists are dense (as may occur with problems of warehousing and transporting goods).
The next common approach is trial and error (the brute force solution), where you try combinations and backtrack on failure until you eventually reach an answer.
The problem is that you can quickly run into more possible combinations tested than there are particles in the visible universe. The worst case can take you more time than this universe is expected to last.
As a general rule of thumb, if your proposed algorithm involves any back-tracking, you have failed. The back-tracking will become non-polynomial with just moderate numbers.
If you are feeling mathematical, I suggest that you skip the rest of my blog and read Stating P=NP Without Turing Machines and Tutorial on Computational Complexity by Craig A. Tovey.
Tovey summarizes the issue as In complexity theory, we make the following sweeping generalizations: If an algorithm runs in polynomial time, it is fast; otherwise it is slow. Fast is good; slow is bad. A problem that we can solve by a fast algorithm is easy; a problem that we cant is hard.
The following diagram is open source under Apache Software License 2.0:
Just so everyone is on the same page, when mathematicians write a number followed by an exclamation point, they are indicating factorial.
2! is two times one, or two. 2! = 2*1 = 2
3! is three time two times one, or six. 3! = 3*2*1 = 6
4! is four times three time two times one, or twenty-four. 4! = 4*3*2*1 = 24
5! is five times four times three time two times one. 5! = 5*4*3*2*1 = 120 (hundreds)
6! is six times five times four times three time two times one. 6! = 6*5*4*3*2*1 = 720
7! is seven times six times five times four times three time two times one. 7! = 7*6*5*4*3*2*1 = 5,040 (thousands)
8! is eight times seven times six times five times four times three time two times one. 8! = 8*7*6*5*4*3*2*1 = 40,320
9! is nine times eight times seven times six times five times four times three time two times one. 9! = 9*8*7*6*5*4*3*2*1 = 362,880
10! is ten times nine times eight times seven times six times five times four times three time two times one. 10! = 10*9*8*7*6*5*4*3*2*1 = 3,628,800 (millions)
11! is eleven ten times nine times eight times seven times six times five times four times three time two times one. 11! = 11*10*9*8*7*6*5*4*3*2*1 = 39,916,800
12! is twelve times eleven ten times nine times eight times seven times six times five times four times three time two times one. 12! = 12*11*10*9*8*7*6*5*4*3*2*1 = 479,001,600
13! is thirteen times twelve times eleven ten times nine times eight times seven times six times five times four times three time two times one. 13! = 13*12*11*10*9*8*7*6*5*4*3*2*1 = 6,227,020,800 (billions)
You see how fast we got to billions. The numbers become ridiculous at a faster and faster pace.
brute force dom room solutions
The number of unique subsets (combinations) of 100 students from a list of 400 is:
400! / (100! (400-100)!)
This number is greater than 70! and the estimates of the number of subatomic particles in the observable universe is between 40! and 70!.
So, lets take a look at a few simple brute force versions of the dorm room problem.
easy going approach
A common approach suggested is to start with the most easy going students and use trial and error to find aceptable matches. Well, as mentioned before, any backtracking algorithm is a failure for a factorial combination problem.
A common suggested solution for the 50 rooms of two students each is to fill the rooms with the 50 most easy going students, then fill in each room with the next most easy going student. There is an (incorrect) assumption that this will always work.
Someone from Baden-Württemberg presented the following pathological counter-example:
A and X are very much unfriendly. N and M are very friendly (can be paired with anyone).
If you try a nice-guys first ranking, then you end up looking at the two most hated last and they are incompatible.
Another problem is hate loops. There is no direct conflict, but there are a series of hates that create a loop.
Another uncredited example from the internet:
CL, FL, EK, AB, AD
JK, JL, GH, EG, DJ
AF, DK, BH, FI, AG
BG, DL, CK, AH, AI
GL, AK, IJ, BD, IK
FH, EH, FG, DG, GI
BF, DF, CH, AE, CJ
Pick four that are mutually compatible.
You can solve this problem by hand.
Try to come up with a general algoithm that will solve this problem and any similar problem (any list of 12 items and any version of incompatiblity pairs). If your general purpose algorithm involves any backtracking, it is NP and will not scale.
When we mention scale, it is because many of these problems are easily solved by trial and error in a reasonable amount of time for small sets of data. But as soon as the number of combiantions grows, the problem becomes too big to fit into any classical computer.