Notes on Graphs

The Word Wide Graph?

We can think of the Web as a directed graph. Each URL is a node. Each link is an edge from one URL to another:

There are many different orders in which we can traverse the web. Often times, we keep following links that appear interesting or relevant until either there are no more links or we are back to a page we have already visited. Then we often back track to the last page where there is a link we have yet to see.

For example, from the CS 367-3 Home Page we might, click on Assignments then on Program Two, then on Program One, then on Program Zero. We realize we are looking for something else, so we back track to a point where we have further options, in this case, Assignments, and then proceed to Final Project, to Notes and then back to Final Project, and then finally to Program Three where we find some information on AVL trees. This type of traversal should seem familiar --- it is an example of depth-first search.

Breadth-First Search

There are many other ways in which we might traverse a graph. Imagine exploring a maze without a map (perhaps in the aMazed game). We might wish to "slowly" explore the maze without straying too far from our starting point until we understand all our options. For example, consider the maze represented by this graph:

Breadth-first search traverses a graph by starting with a node, marking it, putting each of its neighbors on the back of a queue, then as long as the queue is not empty, removing the node at front of the queue and repeating. For example, for our maze, we might have the following ordering:

Notice that by using breadth-first search we would find the exit before falling in the trap and that this would not be the case using depth-first search.

An interface for a graph class

class Graph {
public:
  static const size_t MAX_NODES = ...
  Graph(); // construct an empty graph
  size_t size() const; // number of nodes in graph
  bool isEdge(size_t i, size_t j) const; 
  void bfs(size_t start) const;
  void addNode();
  void addEdge(size_t i, size_t j);
  void removeEdge(size_t i, size_t j);
  // ...
private:
  bool adjMatrix[MAX_NODES][MAX_NODES];
  size_t _nodeCount;
};

Implementing Breadth-First Traversal

void Graph::bfs(size_t start) const
{
  Queue q;
  bool *marked = new bool[_nodeCount];
  for(size_t i=0; i < _nodeCount; i++)
    marked[i] = false;

  q.enqueue(start);
  marked[start] = true;
  while(!q.isEmpty()) {
    size_t x = q.dequeue();
    for(size_t i=0; i < _nodeCount; i++) {
      if (!marked[i] && adjMatrix[x][i]) {
	q.enqueue(i);
	marked[i] = true;
      }
    }
    cout << x << endl;
  }
}

Thought question: What kind of traversal do we get if we replace the Queue in this function with a Stack (and enqueue with push and dequeue with pop)?

Run-time complexity of BFS and DFS

Both BFS and DFS run in O(E) where E is the number of edges in the graph.

Using BFS to determine shortest paths

BFS processes nodes in terms of nodes that are closest to farthest away:

     0 <-- 1 <--               
     | \         \
     |  --> 2 --> 3
     |            |
     --> 4 <-------

The order in which nodes are "visited" as a result of BFS(0):

	0, 4, 2, 3, 1
	   \__/ \_/\_/
	1 step   |  |______ 3 steps away
	away     |____ 2 steps away

Suppose a graph is used to represent a street map with a node for every intersection and an edge representing one-block. Given a starting intersection, we might use BFS to find all intersections reachable in a certain number of blocks or less. Or we might use it to find those intersections farthest away (the last nodes reached by BFS).

We can augment the BFS algorithm to compute the shortest distances (number of edges) between two nodes in a graph. (Possibly part of a graph-related final project.)

However, suppose we now consider the case where each edge has a numerical distance (or weight) associated with it. In that case, BFS is no longer good enough to compute shortest paths. (Suppose in the above example concerning street maps, if each street was of varying length.)

Labeled graphs

We have already seen how nodes in graphs can have values (or names or labels) associated with them. We can also attach labels to edges:

In particular we can label the edges of a graph with numerical values indicating weights or distances. (Think about how we might implement this.)

Shortest paths

Problem: Given a graph with edges labeled with positive distances, and a distinguished node x, find the shortest distances between x and all other nodes in the graph.

There are several possible solutions. Probably the most famous is Dijkstra's algorithm. The idea:

Use an array called known of bool: is shortest path to this node known yet?
Use an integer array called distance to keep track of the distances between x and the other nodes in the graph. Once a node becomes "known", this is the shortest distance from x; before that, this distance may be greater than the shortest distance from x. When the algorithm is finished, distance will contain the distance from x to each other node in the graph.
Iterate: each time around find length of path to the node "next closest" to x that node's path length then becomes "known"
for all nodes whose path lengths are not yet known: keep track of the shortest path via only "known" nodes

Pseudocode:

// initialize
set all elements of known array to false
set distance[x] to 0
set all other elements of the "distance" array to infinity
(e.g., use -1 to represent this)

// main loop
find the "unknown" node n with the smallest distance value
set known[n] to true
for all unknown successors m of n, set Distance [m] =
        min (distance[m],
             distance[n] + length of edge n -> m)

Example:

                           /---> Green Bay
                     (31) /          ^
                         /           |
          /----> Appleton            | (108)
   (110) /                           |
        /   (56)            (23)     |
 Madison ------> Delafield -----> Milwaukee <--
        \                                     |
    (43) \                                    | (71)
          \--> Beloit -------------------------

After 1 time around the main loop:

	  Known	  Distance
 	  -----   -------
Madison   | T |   |   0 |
          -----   -------
Appleton  | F |   | 110 |
          -----   -------
Green Bay | F |   |     |
          -----   -------
Delafield | F |   |  56 |
          -----   -------
Milwaukee | F |   |     |
          -----   -------
Beloit    | F |   | 43  |
          -----   -------

After 3 times around the main loop:

	  Known	  Distance
 	  -----   -------
Madison   | T |   |   0 |
          -----   -------
Appleton  | F |   | 110 |
          -----   -------
Green Bay | F |   |     |
          -----   -------
Delafield | T |   |  56 |
          -----   -------
Milwaukee | F |   |  79 | <-- This entry was 114 after 2 times around the loop
          -----   -------     that was the distance of the path via Beloit.
Beloit    | T |   |  43 |     This value was changed the third time around
          -----   -------     the loop to the (shorter) distance via Delafield.

After 6 times around the main loop (all distances are finally known):

	  Known	  Distance
 	  -----   -------
Madison   | T |   |   0 |
          -----   -------
Appleton  | T |   | 110 |
          -----   -------
Green Bay | T |   | 141 | <-- This entry was 187 before being updated to 141.
          -----   -------
Delafield | T |   |  56 |
          -----   -------
Milwaukee | T |   |  79 |
          -----   -------
Beloit    | T |   |  43 |
          -----   -------

Run-time complexity for shortest-path algorithm

(for a graph with n vertices)

initialization step takes time O(n)
The main loop
- Find unknown node k with shortest distance: O(N) to look at all nodes' distances
- Set known[k] to true: O(1)
- Set distances of all unknown successors of k: O( # edges out of k )
In the worst case, the main loop will execute n times (ALL nodes' distances will become known). The most expensive step in the main loop is step 1, which is O(n). That step dominates the other 2 steps.

So the worst-case time is:

iterations of main loop * cost of one iteration = 
                                          n * n = 
                                          O(n^2)