Kuhn's "Hungarian" Assignment Algorithm, Sans Dual Variables Eric Bach CS 787 Fall 2018 Let (a_ij) be an n x n matrix. Our goal is to determine a permutation sigma that minimizes cost := sum_i a_{i, sigma(i)}. Without loss of generality all the a_ij are non-negative. In graph theory terms, we want to choose, among the n! matchings of the weighted complete bipartite graph K_{n,n}, one of smallest total weight. In matrix terms, we want to select positions for n non-attacking rooks and minimize the sum of the numbers at those positions. The first idea behind the algorithm is that adding or subtracting a constant to any row or column does not change the identity of the optimal matching. So, if we start with the matrix 3 1 2 1 5 9 2 6 5 we can subtract the minimum element from each row to get 2 0 1 0 4 8 0 4 3 Note that the first two columns have zeroes, but the last doesn't. So we can subtract the minimum element from the last column to obtain 2 0 0 0 4 7 0 4 2 If we can make a perfect matching out of the 0's, we are done. If not, we must use another idea. This is Konig's theorem: in any bipartite graph, the number of edges in a maximum matching equals the minimum number of vertices needed to cover all the edges of the graph. On the matrix, the vertex cover is given by choosing rows and columns (called "lines"). For the above array, the largest zero matching we can make has size two, so two lines will suffice to cover. These lines are as indicated: | -- 2-0-0 -- | 0 4 7 | 0 4 2 | We must now process more than one line at a time. Let delta be the minimum of the uncovered entries; here, delta = 2. What we do is Add delta/2 to covered lines (rows or columns) Subtract delta/2 from uncovered lines. This is called a "pivot step". The effect of this is to subtract delta from uncovered entries, and add delta to doubly covered entries. After pivoting the new matrix is 4 0 0 0 2 5 0 2 0 There is now a matching of 0's, indicated by *'s in the matrix: 4 0* 0 0* 2 5 0 2 0* In terms of the original costs, this is 3 1* 4 1* 5 9 2 6 5* so the optimal assignment has cost 7. In general, we go back and forth between trying to make a larger matching and pivoting, until we have a perfect matching. It is clear that if this procedure works, we get an optimal assignment. The next task is to see why it terminates. If the costs are integers, we can reason as follows. Let there be k covered rows, l (ell) covered columns, with k+l < n. Then we have (n-k)(n-l) entries reduced by delta, and kl entries increased by delta. The net reduction (in units of delta) is (n-k)(n-l) - kl = n^2 - n(k+l) > 0 So the sum of the entries decreases with each pivot step. However, all entries are non-negative integers. So the algorithm terminates. This gives a proof of termination, but the running time bound is not polynomial in input length. To get such a bound we must choose a cover in just the right way. As is well known, bipartite matching can be reduced to maximum flow in a network with unit capacities. Computation of a maximum flow proceeds by a series of searches, in a flow's residual graph. For a given residual graph R, we call a node *wet* if this search reaches it, and *dry* otherwise. Since left (resp. right) nodes in a bipartite graph are associated with rows (resp. columns) we extend this terminology and refer to wet and dry lines. The Konig cover (K-cover for short) consists of the dry rows and the wet columns. (Intuitively, we imagine fluid percolating from left to right in the network, so these are rows and columns that are on the "wrong" side.) This is a minimal cover. We can now prove the following theorem: If the algorithm always uses the K-cover, its running time is polynomial. This will follow from the following observation: After each pivot step, either the maximum matching has increased in size, or the set of wet nodes has gotten bigger. --------------- (dry) --------------- | | | | | | | | | (wet) Proof: Number rows and columns so that we have the above picture. Suppose we have a doubly covered 0 in position (i,j). This 0 cannot be in the current matching: if it were the residual network would have an edge i <------ j (dry) (wet) which is impossible. Since it is not in the matching, the residual network has i ------> j (dry) (wet) which is removed by pivoting without affecting the set of wet nodes (this edge wasn't traversed by the search). Pivoting will create at least one new 0 in the lower right rectangle. Suppose it is at position (i,j). If neither row i nor column j has a zero, then the matching can be enlarged. In any case, the residual network has a new edge i -----> j (wet) (dry) and the search will now reach the dry column vertex j. So we have strictly enlarged the set of wet nodes. Notes. 1. I first saw the "decreasing" version of the Hungarian algorithm, as done here, in a lecture by R.M. Karp. The polynomial running time bound follows ideas in Andras Frank, On Kuhn's Hungarian method -- a tribute from Hungary, TR 2004-14, Egervary Research Group, Budapest, 2004. 2. The number of steps in the computation is bounded by a power of n, the dimension of the problem. Since the bound does not involve the matrix entries, it is a so-called strongly polynomial bound. 3. The Hungarian algorithm is often presented using dual variables. For this, see L. Ford and D. Fulkerson, Flows in Networks, pp. 111-112. If one compares the two presentations,, one sees that the numbers in our algorithm above are "slack" values which indicate how far each dual constraint is from being tight. 4. Kuhn's original paper is in Naval Research Logistics Quarterly, v. 2, 1955, pp. 83-97. The strongly polyomial time bound is due to J. Munkres, J. SIAM, 1957.