---------------------------------------------------------------------
CS 577 (Intro to Algorithms)
Lec 18 (11/07/06) Shuchi Chawla
---------------------------------------------------------------------
Today: Min-cost max-flow, min-cost circulation; Randomized algorithms
Min-cost max-flow
=================
Last time we discussed the min-cost bipartite maximum matching
problem: we are given a bipartite graph with costs on edges; of all
the matchings of maximum size in this graph, we want to find the one
with the minimum cost. This problem can be reduced to the min-cost
max-flow problem using the usual reduction from bipartite matching to
flow.
In the min-cost max-flow we are given a graph G with source s and sink
t. Edges in the graph have a capacity c(e) as well as a cost w(e)
associated with them. We want to find a maximum flow in this network,
but if there are more than one max flows, we want to find the one with
the minimum cost.
How should we go about solving this? One approach is to modify the
Ford-Fulkerson algorithm to take costs into account. Let us recall the
FF algorithm:
1. Initialize flow to zero; construct residual graph G'
2. Repeat until done
a. Find a path from s to t in G'
b. Add flow along this path
c. Update graph G'
We mainly need to modify step 2a, but first we need to incorporate
costs into the residual graph. This is simple: if an edge e=(u,v) in G
has cost w(e), we assign the forward edge (u,v) in G' a cost of w(e)
and the backward edge (v,u) in G' a cost of -w(e). Then, just modify
step 2a to always pick the minimum cost from s to t. That's all!
How do we find the minimum cost path from s to t? Just treat the costs
as lengths and use Bellman-Ford to find the shortest path from s to
t. Note that the residual graph G' includes negative cost edges, so we
cannot use Dijkstras to find the shortest path.
The running time of this algorithm is O(Fmn), where F is the size of
the maximum flow. This can be sped up by using smarter ways of picking
s-t paths, but that is beyond the scope of this course.
---------------------------------------------------------------------
Min-cost Circulation
====================
Let us now look at an extension of the min-cost max-flow problem.
Suppose that instead of a flow from some s to some t, we just wanted a
flow that was balanced everywhere, that is, there is no source or
sink, but the flow just circulates through the network. This is called
the min-cost circulation problem.
Exercise: How would you reduce the min-cost max-flow problem to the
min-cost circulation problem?
Note that we don't have a max-flow requirement any more. If the
network has no negative cost cycles, then any circulation has a
strictly positive cost, so the optimal solution sends no flow at
all. On the other hand, if the network has negative cost cycles, then
it helps to saturate those cycles completely.
This suggests the following algorithm for solving this problem (due to
Klein): just run Ford-Fulkerson and in step 2, pick any negative cost
cycle in the residual graph G' and saturate it with flow.
Why is this algorithm correct? Well, if there is any negative cost
cycle remaining in the residual graph in the end, then clearly we can
further reduce the cost of our circulation by sending more flow along
this cycle. On the other hand, suppose that our algorithm finds a
suboptimal flow f, while the min-cost flow is f*. Then f*-f is a
circulation which is composed of negative cost cycles (think about why
this is the case), which contradicts the fact that our algorithm ended
with flow f.
In order to complete the description of this algorithm, we must give
an algorithm for finding negative cost cycles. One way of doing this
is using the Bellman-Ford algorithm.
Recall that Bellman-Ford finds shortest paths between all pairs of
nodes assuming that there are no negative cost cycles in the
graph. What happens to the algorithm when the graph has negative cost
cycles?
Let us recall Bellman-Ford in more detail. In the i-th iteration of
the algorithm, we construct shortest paths with at most i hops from
every node to every other node, using paths with fewer hops computed
in previous iterations. When the graph does not contain negative cost
cycles, this process converges after n-1 iterations and we obtain all
pairs shortest paths. If the graph does contain negative cost cycles,
owing to the cycle, the cost of some paths keeps decreasing beyond the
n-th iteration. This is because a path containing such a negative cost
cycle can become shorter and shorter by going over the cycle again and
again.
How do we use this to find the negative cost cycle? Recall that
Bellman-Ford also keeps track, for every pair (u,v), of the next hop
on the shortest path from u to v starting from u. We just consider a
pair (u,v) whose shortest path length decreases after the n-th
iteration. We follow the hops from u towards v until we find the
negative cost cycle. This modification of Bellman-Ford has the same
time complexity O(mn) as the original algorithm.
Combining this with the F-F algorithm, we get an O(mnF) time algorithm
for the min-cost circulation problem.
---------------------------------------------------------------------
Randomized Algorithms
=====================
We will now look at a new technique for algorithm design -- the use of
randomness or coin tosses. Randomized algorithms are similar to
non-random or "deterministic" algorithms, except that some times they
toss coins and decide what to do next based on the outcome of those
coins.
One randomized algorithm that perhaps most of you are familiar with is
Quicksort. We will sutdy this algorithm in detail in the next lecture.
How does randomness help?
- Sometimes it helps us save time or other resources.
- Sometimes it makes algorithm design simpler, while providing the
same time/space guarantees as some deterministic algorithm.
- Sometimes it provides us additional properties such as privacy.
Let us look at some examples of these.
Example #1: Comparing numbers
=============================
Alice and Bob have an n-bit number each -- A for alice and B for
Bob. They want to find out whether the two numbers are the
same. However, they are communicating through telegrams and each bit
costs money to send. So they want to minimize the number of bits they
send to each other. Note that they can fix a protocol for what bits to
send before playing this game; deciding upon a protocol doesn't incur
any cost.
What is the best way for them to decide whether their numbers are the
same or not?
It turns out that if they use a deterministic algorithm and send fewer
than n bits through the telegram, there is always a pair of numbers A
and B on which they will get the wrong answer. This means that if they
don't use randomization, they must communicate at least n bits.
Can we use randomness to help here? Indeed we can! Here is the idea:
Suppose that they pick some arbitrary number x between (say) 1 and 10,
and compare A mod x to B mod x. If A and B are the same numbers, the
numbers A mod x and B mod x will also be the same, and they will get
the correct answer. However, if A and B are different, then what is
the chance that A mod x = B mod x? This can only be the case if x
divides A-B. Assuming that A-B has few factors in the range 1 to 10,
if we pick x randomly, there is only a small chance that we will pick
one of the factors of A-B.
But how does this help us? Well, now we are sending across only log 10
bits instead of n bits (because both A mod x and B mod x are numbers
smaller than 10).
Note that we are trading off correctness with time. By sending fewer
bits, we may some times get the wrong answer, but this will only
happen with a certain small probability.
Let us make this protocol more precise. Since we are talking about
factors, let us pick x to be a prime number. Furthermore, in order to
make the failure probability small, let us pick a range larger than 1
to 10. In particular, we will pick x uniformly at random from all
prime numbers in the range 1 to n^2. (The phrase "uniformly at random"
means that we will pick each prime number with the same probability.)
Now, A-B is an n bit number, and so, A-B < 2^n. This implies that A-B
can have at most n prime factors (do you see why?).
On the other hand, the prime number theorem says that there are around
O(n^2/log n) primes in the range 1 to n^2.
This means that at most a (log n)/n fraction of prime numbers between
1 and n^2 divide A-B. So the probability that we pick x to be one of
these factors is at most (log n)/n.
In other words, the protocol will fail with probability (log n)/n, but
with probability 1 - (log n)/n it will return the correct answer.
(Note that (log n)/n is extremely small for large n.) The number of
bits we send across is log (n^2) = O(log n).
---------------------------------------------------------------------
The message to take away from this example is that often you can
trade-off the correctness of an algorithm for running time. This is
one of the guiding principles behind the design of randomized
algorithms.
We will see two kinds of randomized algorithms in this course:
1. Algorithms that always have a small running time but output the
wrong answer with a small probability.
2. Algorithms that always output the correct answer but have a large
running time with a small probability.
In the latter case, we will bound the "average" or expected running
time of the algorithm.
We will now look at an example of the second kind.
Example #2: Contention resolution
=================================
See section 13.1 of the book.
---------------------------------------------------------------------