The Travelling Thief Problem

Introduction

There are two quintessential problems that often pop-up in an algorithms course, especially when the topic of the day is about combinatorial optimisation. These are the:

Travelling salesman problem (TSP): Given a weighted graph \( G \) with \( n \) nodes such that each edge \( (u,v) \) has a nonnegative cost \( c \), find a minimum-cost Hamiltonian cycle of \( G \).
and the 0-1 knapsack problem: Given a set of \( n \) unique items, each with a weight \( w \) and value \( v \), and a knapsack with weight capacity \( W \), find a subset of these items that maximises the total value without exceeding the capacity of the knapsack.

The TSP is fairly intuitive: “Given a bunch of interconnected cities, what is the shortest route that visits each city exactly once?” A path through a graph that visits each vertex exactly once and finishes on the same vertex where it started is called a Hamiltonian cycle. In the context of the TSP, you’ll also likely encounter the term tour, which is a walk (sequence of vertices and edges) with no repeated edges. It turns out that Hamiltonian cycles are also tours, but not the other way around.

The 0-1 knapsack problem is also intuitive, and if you’re a gamer you’ve likely already been subconsciously aware of it. Any player of an RPG is constantly trying to maximise the value of their limited-capacity inventory, and the way we tend to do that is to prioritise low-weight, high-value items (such as gemstones) over bulky, worthless items.

Both the TSP and 0-1 knapsack are interesting from the perspective of computational complexity as both these problems are NP-hard problems.

NP or not NP

In complexity theory it is common to assign problems to classes that more or less have the same complexity. This complexity is typically measured in the amount of resources needed to solve the problem, whether that be space (as in memory) or time (how long a program must run in order to solve the problem). Time complexity is usually the go-to, and one of the main classes is P, which stands for polynomial time. That is, a problem in P can be solved with a running time equal to some polynomial of the input. A trivial example is multiplying two \( n \times n \) matrices; the most basic algorithm here takes \( \mathcal{O}(n^3) \) time, hence it is polynomial.

Now let’s discuss the class NP. First of all, the class NP does not mean not-polynomial (my algorithms lecturer exclaimed that anyone who so much as suggests that will automatically fail the unit). To understand the difference between P and NP let’s introduce Turing machines. Invented by none other than Alan Turing, the father of theoretical computer science, Turing machines are the highest automata on the Chomsky heirarchy, and can be thought of as the ultimate mathematical model of abstract computation. Anything that a computer can do can be modelled on a Turing machine (Church-Turing thesis). Automata theory is a fascinating subject in its own right, but for now we only need to distinguish between deterministic Turing machines and nondeterministic Turing machines. In general, a deterministic automata is one where every state has a single transition to some other state (e.g a light switch in the ‘OFF’ position has one transition to the ‘ON’ position), thus a given initial state will always lead to the same output state. A non-deterministic automata is different; each state may have multiple transitions to other states, thus it cannot be completely determined in advanced what the final state will be – the same initial input essentially has a randomised output. That’s just the basics; for more I recommend further reading into the subjects of formal languages and automata theory.

Perhaps now you have a feel for what the N in NP stands for. NP is nondeterministic polynomial time; any problem in NP can be solved by a nondeterministic Turing machine in polynomial time. This also means that the solution of any NP problem can be verified by a deterministic Turing machine in polynomial time. One of the classic problems in theoretical computer science is whether P=NP; if so, problems verifiable in polynomial time can also be solved (deterministically) in polynomial time.

NP-hard refers to the class of problems that are at least as hard as the hardest problems in NP. It just so happens that many NP-hard problems do not have polynomial time solutions (perhaps the source of the common misconception over what the name stands for). If a polynomial algorithm is found to solve one NP-hard problem, it automatically solves all NP problems. Thus NP-hard problems are of key interest to computer scientists.

The Travelling Thief

The Travelling Thief Problem (TTP) is a relatively new problem that was designed to better reflect real-world scenarios (although the problem is given from the point of view of a thief, it can just as easily apply to a delivery van). For a more complete description, see this initial paper that proposed the problem, as well as this paper. The problem essentially starts off as the following:

There are a set of \( n \) cities, and a set of \( M \) items located within the various cities, such that each city has a unique set of items. The cities form a complete graph, with the distance between any pair of cities \( (i, j) \) given by \( d_{ij} \). Each item \( k \), located in city \( i \), has a potential profit \( p_{ik} \) and a weight \( w_{ik} \). The thief must visit each of these cities exactly once, and the total weight of the collected items must not exceed the thief’s maximum capacity \( C \). The thief must pay a flat renting rate \( R \) for each unit of time spent travelling. Initially, the thief travels at some starting speed \( v_{\text{max}} \). This speed is reduced in proportion to the accumulative weight of added items, down to a minimum speed of \( v_{\text{min}} \) at maximum capacity. Given these constraints, the goal is to find both a tour \( \Pi \) and picking plan \( P \) that maximises the total profit.

Here the tour is simply the list of cities to visit in order, and the picking plan is a list of items to collect from each city. The key aspect that distinguishes the TTP from simply applying 0-1 knapsack to a graph, is that the more weight the thief carries, the slower they travel, and that each unit of time spent travelling costs money (ultimately reducing the profit). Let’s now briefly discuss three methods that can be used to solve the TTP.

Dynamic Programming

Dynamic programming first emerged in the 1950s as a method of mathematical optimisation (it is so named because the powers-that-be at the time disliked the terms “mathematical” and “research”, but Bellman needed funding somehow). It is characterised by two key concepts:

Split a problem into recursive sub-problems
Store the solutions to these sub-problems in a table, then combine them to find the solution to the original problem

A great example of dynamic programming is Project Euler’s Problems 18 and 67 (finding the maximum path sum down a triangle). As with all Project Euler problems there are countless tutorials and code snippets on the web, but the core idea is to the process the triangle from the bottom using a recursive formula.

The 0-1 knapsack problem is typically solved using dynamic programming, and the TTP is no different. The recursive sub-problems are simply the intermediate tours. Necessarily, the profit for the entire tour is related to the overall profit of the tour up the second-last city plus whatever items are chosen in the final city. And so the recursion goes on; the profit in the second-last city depends on the overall profit up to the third-last city, plus whatever items are chosen in the second-last city.

B&B

B&B search, short for branch and bound search, is a type of tree search that happens to be well suited to solving combinatorial optimisation problems. The idea is to find the maximum profit by enumerating over the entire search space (i.e the entire domain of feasible solutions). This is not done all at once, however. The search space is (recursively) partitioned off into subsets (this process is the branching). Instead of looking at every single state in the tree, the algorithm keeps tracks of bounds in order to prune parts of the tree (i.e ignore parts of the tree that cannot possibly lead to an optimal solution). A classic example to elucidate the concept of pruning is α–β pruning, which used to improve the performance of the minimax algorithm for two-player, perfect-information, zero-sum games (like Chess and Go) by ignoring options that the opponent (assuming they are playing optimally) would never choose. Branch and bound is one of many methods used to solve the TSP, where the distance (in particular, the estimated remaining distance) is used to bound the search. This approach can also be used to solve the TTP, for the distance \( d \) directly impacts the transport costs and speed of the thief.

Constraint Programming

Constraint programming is a significantly more mathematical and generalised method compared to the previous two. There are no strict definitions, but with this approach we explicitly declare a set of constraints in advance, and the solution is found by finding feasible solutions that meet each of the constraints. Constraint programming is thus often used to solve constraint satisfaction problems (or CSPs). The oft-quoted, quintessential CSP (at least in textbooks) is the 8-queens problem, while a more familiar example is Sudoku. There are at least four constraints on the TTP; the most obvious of which is (1) the total profit of the stolen items (which must be maximised). Others include (2) the fact that each city is visited once (so the search cannot include duplicates), (3) the calculation of the total item weight (which is recursively related to the weight of the items already picked) and (4) the capacity of the knapsack.

Conclusion

The Travelling Thief Problem is a merger of two common combinatorial optimisation problems; the Travelling Salesman Problem and the 0-1 Knapsack Problem. This new problem can be tackled by incorporating methods used to solve each of these parent problems, namely dynamic programming, branch-and-bound search and even constraint programming.