The Traveling Salesperson Problem

danso · on May 3, 2015

It's amazing to me that someone of Norvig's stature still finds time and joy in tackling "common" CS questions and not just writing his own code, but going the extra mile by doing it in a presentation format (ipython notebook) that's designed to be reader and replication-friendly...nevermind creating a fleshed out narrative of the process.

A list of his notebooks here: http://norvig.com/ipython/

His explanation of the Cheryl's birthday brain teaser, which was posted here last week: http://nbviewer.ipython.org/url/norvig.com/ipython/Cheryl.ip...

edit: One of my favorite bits: Norvig even takes time to describe his thought process behind what most people would consider the most basic object-oriented design: how to construct a simple Point class, and then explains how that will affect his design and implementation of subsequent functions and routines: http://nbviewer.ipython.org/url/norvig.com/ipython/TSPv3.ipy...

j2kun · on May 3, 2015

Here's a more or less complete list of references for the known approximation and inapproximability bounds for TSP and its special subproblems.

http://cstheory.stackexchange.com/questions/9241/approximati...

One thing to note in particular is that general TSP has no non trivial approximation guarantee. Norvig's article might not stress this strongly enough, the guarantees he describes really depend on the triangle inequality. At least, I think the gap between what's possible for these two versions is deep enough to mention.

Another important note is that the TSP subproblem Norvig attacks, Euclidean TSP, has much better theoretical guarantees than a 2-approximation. In fact, the problem has what's called a polynomial time approximation scheme (for a fixed dimension) allowing one to efficiently compute an approximation that is arbitrarily close to optimal. The runtime of the algorithm is something like n (log(n))^c where c depends on both the accuracy and the dimension.

Xcelerate · on May 3, 2015

It's amazing to see all of this described in such detail. Very nice presentation!

Also, I'm going to mildly hijack this thread with a question since I know people familiar with this kind of problem will be looking at the comments ;)

The TSP is essentially trying to find the correct permutation of an ordered list of cities (up to a cycle) that minimizes the total distance traveled.

I have a somewhat similar problem. Given a set of vectors (in R^3), I am trying to find a way to generate a permutationally invariant representation of these vectors that is also rotationally invariant. The rotational invariance is easily solved by constructing the Gram matrix of the set of vectors (each element M_ij in the matrix is the inner product between vector x_i and vector x_j). Of course, the Gram matrix is overcomplete now (redundant), so I take the Cholesky decomposition to get a unique representation that has 3 less degrees of freedom (all the rotational degrees) than the original set.

For permutational invariance though, I'm having a heck of a time. I've been extensively searching the academic literature on this topic, and maybe I'm just not using the right search terms, but I can't find anything. Recently I've been reading about symmetric groups and trying to figure out if there's some kind of linear basis that I can construct its elements out of, but I'm not having much luck. Anyone have any ideas?

If that problem is too hard, consider this simpler problem: Given a metric between two sets of vectors (of the same order), I want to find the permutation of the second set that minimizes this metric. So basically, min(|| A - P A' P' ||), where ||X|| denotes the metric, and P is a permutation matrix. I feel like there is some kind of relation between this question and the TSP, but I can't quite figure out what.

psb217 · on May 3, 2015

This is just the problem of forming automorphism-invariant encodings of a graph, but extended to permit weighted graphs.

First, given the Gram matrix for your vectors, interpret its entries (read from left-to-right and top-to-bottom) as defining a cumulative sum.

Next, order your vectors such that the values in the order-induced cumulative sum are maximal for as many of the partial sums as possible, compared with the cumulative sums induced by any other possible ordering. This produces the same "canonical ordering" and hence the same "canonical Gram matrix" for any set of vectors that have the same collection of pair-wise relationships, as measured by dot-products.

For graphs with binary adjacency matrices this can be stated more simply as sorting the vertices such that the resulting adjacency matrix, when read l-to-r and t-to-b, produces the largest binary number. This encoding uniquely identifies the automorphism group of the encoded graph (i.e. all isomorphic graphs produce the same encoding).

If you were willing to represent your pair-wise dot-products u sing fixed-precision binary representations, then you could just use the "matrix as binary number" approach directly. Though, each matrix entry would now contribute multiple bits to the binary number, rather than just one.

Note that this approach is not efficient. But, your problem subsumes standard graph isomorphism, so a polynomial-time solution would be noteworthy. All of what I've said makes no assumption about the vectors' dimensions. There's probably a more efficient approach for vectors constrained to a relatively low-dimensional space.

tribe · on May 3, 2015

I know it is computationally a bit of a pain, but if you have points v_1,v_2,v_3 does the Jordan Canonical Form of the matrix [v_1 v_2 v_3] not satisfy your requirements? I believe that the original matrix is similar to both its permutations and rotations, so the JCF should be invariant. My linear algebra is a bit rusty, so I might be wrong

mck- · on May 3, 2015

I'm surprised that there is no mention of Tabu Search or other metaheuristics (like simmulated annealing, as pointed out in this thread). In my research, I discovered it to be one of the best algorithms for this class of problems (determinisitic, non-black-boxy, flexible, can incorporate expert knowledge).

Also worth mentioning is that while the TSP is interesting from an algorithms class perspective, a problem that occurs far more often in real-life is the Vehicle Routing Problem.

The Vehicle Routing Problem (with time-windows, types, multiple vehicles, multi-depot, capacities, etc) is a much more complex problem compared to the TSP, and I would argue that pretty much all the algorithms mentioned in the article would fail to transfer.

Here's an open-source library written in Common Lisp that implements the Tabu Search approach [1] -- shameless plug, but we also build an API for it [2]

[1] https://github.com/mck-/Open-VRP

[2] https://www.routific.com/developers

alfiedotwtf · on May 3, 2015

I'm wanting to dabble with metaheuristics. You mention VRP... what are other "hello world" problems for testing different algorithms? Thanks!

graycat · on May 3, 2015

For the set of real numbers R, positive integer n, and the Euclidean norm on R^n, there is the old R. Karp paper where he shows amazing results for: Find a minimum spanning tree and do a depth-first traversal but deleting arcs that have already been traversed and going directly to the city not yet visited that the depth first traversal would visit next.

So, just code up one of the at least two famous polynomial and amazingly fast algorithms for minimum spanning tree then write the traversal code.

For the Euclidean case, likely tough to beat, all things considered, in practice.

Why not stop there?

(1) In some cases want to be quite close to optimality or have actual optimality.

(2) In some cases, the Euclidean norm need not hold even roughly, e.g., job shop scheduling with setup times.

brador · on May 3, 2015

> There are more sophisticated approximate algorithms that can handle hundreds of thousands of cities and come within 0.01% or better of the shortest possible tour.

Anyone have more details on this more sophisticated algorithm? Even just a name to google would be great.

_delirium · on May 4, 2015

The earliest one to be proposed (1958) is called "2-opt": https://en.wikipedia.org/wiki/2-opt

The Lin-Kernighan heuristic is a related heuristic developed later, which works well in practice on many problems. Here is an implementation (freely available source, but not freely licensed): http://www.akira.ruc.dk/~keld/research/LKH/

JadeNB · on May 3, 2015

j2kun gives a link below: https://news.ycombinator.com/item?id=9481527 .

octatoan · on May 3, 2015

I like how Norvig isn't a Real Programmer. His way of describing the problem before starting to work on it is refreshing.

aikah · on May 3, 2015

> I like how Norvig isn't a Real Programmer

I'm not sure what you mean by "Real Programmer". if one write programs for a living one is a programmer. Reminds me that french joke about the "real hunter" and the "fake hunter".

jeorgun · on May 3, 2015

I think they mean it in the "real programmers use FORTRAN" [1], not the literal, sense.

[1] http://www.catb.org/jargon/html/R/Real-Programmer.html

yen223 · on May 3, 2015

What's the joke?

userbinator · on May 4, 2015

I don't remember the algorithm exactly and Google isn't yielding any useful results, but I seem to recall that if all the distances are integers (or if you make all the distances integers via scaling/rounding) there is a much faster algorithm that relies on "stepping upward" and considering increasingly larger distances. All I remember is that it's very similar to the same principle that makes radix sort sub-linearithmic.

Anyone know what I'm talking about?

j2kun · on May 4, 2015

They would certainly have to be bounded integers, right?

userbinator · on May 4, 2015

Yes, I believe it is something like O(n) where n is the sum of all the distances.

It's probably related to the "pseudopolynomial time subset-sum" algorithm.

Animats · on May 3, 2015

The classic practical algorithm comes from Bell Labs work in the 1960s. Connect the points in some random order. Then pick two links and cut there, leaving three connected chains. Try all 12 possibilities for reconnecting those three chains, and keep the shortest. Repeat until no improvement is observed for a while.

This was one of the first useful nondeterministic computer algorithms.

jessaustin · on May 3, 2015

Try all 12 possibilities for reconnecting those three chains...

I'm only coming up with 8 possibilities. Pick one chain and designate it Aa. That may be followed by:

    Bb, Cc
    Bb, cC
    bB, Cc
    bB, cC
    Cc, Bb
    Cc, bB
    cC, Bb
    cC, bB

If distance depends on direction, multiply by 2 (for aA in addition to Aa), but that still isn't 12.

Animats · on May 3, 2015

Aa can be in the middle.

jessaustin · on May 3, 2015

Aa Bb Cc is identical to Cc Aa Bb (or bB aA cC). That's why we fixed one chain, to simplify the rest of the calculation.

EDIT: on reading TFA I find Norvig does essentially the same thing: "So let's arbitrarily say that all tours must start with the first city in the set of cities. We'll just pull the first city out, and then tack it back on to all the permutations of the rest of the cities." Interestingly he doesn't also eliminate the Aa-aA redundancy, because "First, it would mean we can never handle maps where the distance from A to B is different from B to A. Second, it would complicate the code (if only by a line or two) while not saving much run time."

Animats · on May 3, 2015

OK, non-standard problem definition. Sorry. Usually it's an interconnection problem without specified ends.

idunning · on May 3, 2015

Great stuff from Norvig, as per usual. I find it strange that the next step after enumeration is a heuristic approach, something that comes up a lot in "solve a TSP" posts - there are many ways to solve TSP to provable optimality that perform really well on non-pathological instances, and can often be stopped at any point to get both a probably near optimal solution and, even better, a bound on how good the best solution could be. I would argue that the implementation cost is outweighed by the possibility of usually getting optimal solutions and by having confidence in how far off you are (no chance of producing a terrible solution).

plg · on May 3, 2015

what about global optimization methods, e.g. simulated annealing?

jcoffland · on May 3, 2015

I too was disapointed SA wasn't even mentioned. I've gotten good results with it in the past. It's ability to get out of local minima make it quite powerful. Still, it's an excellent article.

fogleman · on May 3, 2015

Simulated Annealing is my go-to optimization algorithm. I'm using at work to solve a vehicle scheduling problem.

msellout · on May 4, 2015

I believe it was, despite not calling the name out explicitly. Are you suggesting that simulated annealing does not belong in the categories of "repetition strategy," "alteration strategy", nor "ensemble strategy"?

Simulated annealing is essentially hill-climbing with random restarts with the tweak of a changing climb-distance. Depending on how you feel like categorizing that, it's either repetition, alteration, or both (and therefore ensemble).

apricot · on May 3, 2015

Just the right mix of theory and practice to make the data structures class I had as an undergraduate pale in comparison. Thank you Mr. Norvig, and please keep them coming.

avyfain · on May 3, 2015

A while back someone posted a cool simulated annealing post of this[0]. The app seems to be down, but the code is there in case anyone wants to take a look.

[0] http://toddwschneider.com/posts/traveling-salesman-with-simu...

vacri · on May 4, 2015

To me, a 1089-point 'Travelling Salesperson Problem' is finding the sales staffer who is willing to do that in one run... :)

yodsanklai · on May 3, 2015

Is it the non sexist version of the "Traveling Salesman problem"?

pervycreeper · on May 3, 2015

[flagged]

dethstar · on May 3, 2015

Really? Are you like that with every word? Have you kept yourself from reading master pieces of literature because you thought "oh, well I think they should've gone with X word instead"?

Houshalter · on May 4, 2015

I once tried to read some long essay where the author had replaced all instances of "he"/"she" with some weird made up gender neutral pronoun "xe" or something like it. It wasn't explained anywhere and I was very confused for a few minutes.

It felt really weird and unnatural and detracted from the main point. Testified to the fact that years later I have no memory of what the essay was about but still remember this.

frankerz · on May 4, 2015

It's quite sad that some people's first reaction to everything is to assume that the other carries an agenda and there's a need to be defensive.

Person A thinks that everyone else is a sexist person with an agenda. Person B who doesn't agree with person A thinks everyone else is a radical feminist with an agenda. It's a never ending cycle of stupidity.

/rant