Introduction to Graphs

Introduction
Terminology
Some special kinds of graphs
Test Yourself #1
Uses for Graphs
Representing Graphs
Answers to Self-Study Questions

Introduction

Graphs are a generalization of trees. Like trees, graphs have nodes and edges. (The nodes are sometimes called vertices, and the edges are sometimes called arcs.) However, graphs are more general than trees: In a graph, a node can have any number of incoming edges (in a tree, the root node cannot have any incoming edges, and the other nodes can only have one incoming edge). Every tree is a graph, but not every graph is a tree.

There are two kinds of graphs, directed and undirected:

Note that in a directed graph, the edges are arrows (are directed from one node to another) while in the undirected graph the edges are plain lines (they have no direction). In a directed graph, you can only go from node to node following the direction of the arrows, while in an undirected graph, you can go either way along an edge. This means that in a directed graph it is possible to reach a "dead end" (to get to a node from which you cannot leave).

Terminology

Here are two example graphs (one directed and one undirected) and the terminology to describe them.

In the directed graph, there is an edge from node 2 to node 1; therefore:

The two nodes are adjacent (they are neighbors).
Node 2 is a predecessor of node 1.
Node 1 is a successor of node 2.
The source of the edge is node 2, and the target of the edge is node 1.

In the undirected graph, there is an edge between node 1 and node 3; therefore:

Nodes 1 and 3 are adjacent (they are neighbors).

Now consider the following (directed) graph:

In this graph, there is a path from node 2 to node 5: 2->1->5. There is a path from node 1 to node 2: 1->3->4->2. There is also a path from node 1 back to itself: 1->3->4->2->1. The first two paths are acyclic paths: no node is repeated; the last path is a cyclic path, because node 1 occurs twice.

Note that the layout of the graph is arbitrary -- the important thing is which nodes are connected to which other nodes. So, for example, the following graph is the same as the one given above, it's just been drawn differently:

Also note that an edge can connect a node to itself; for example:

Some special kinds of graphs

A directed graph that has no cyclic paths (that contains no cycles) is called a DAG (a Directed Acyclic Graph).
An undirected graph that has an edge between every pair of nodes is called a complete graph. Here's an example:
A directed graph can also be a complete graph; in that case, there must be an edge from every node to every other node.
A graph that has values associated with its edges is called a weighted graph. The graph can be either directed or undirected. The weights can represent things like:
- The cost of traversing the edge.
- The length of the edge.
- The time needed to traverse the edge.
Here's an example of a weighted, directed graph:
A graph is connected if, treating all edges as being undirected, there is a path from every node to every other node. For example:

TEST YOURSELF #1

For each of the following graphs, say whether it is:

connected or not connected
complete or not complete

If the graph is a directed graph, also say whether it is cyclic or acyclic.

solution

Uses for Graphs

In general, the nodes of a graph represent objects and the edges represent relationships. Here are some examples:

Flights between cities. The nodes represent the cities, and there is an edge j -> k iff there is a flight from city j to city k. This graph could be a weighted graph, using the weights to represent the distance, the flight time, or the cost.
Interdependent tasks to be done. The nodes represent the tasks, and there is an edge j -> k iff task j must be completed before task k. For example, we can use a graph to represent what must be done to finish a CS major, with the nodes representing the courses to be taken, and the edges representing prerequisites. Here's a partial example:
Flow charts (also known as control-flow graphs). The nodes represent the statements and conditions in a program, and the edges represent the flow of control.
State transition diagrams. The nodes represent states, and the edges represent legal moves from state to state. For example, we could use a graph to represent legal moves in a game of tic-tac-toe. Here's a small part of that graph:

The reason graphs are good representations in cases like those described above is that there are many standard graph algorithms (operations on graphs) that can be used to answer useful questions like:

What is the cheapest way to fly from Madison to Saskatoon?
Which CS classes must I take before I can take cs640?
Is it possible for variable k to be used before being initialized?
Can I win a game of tic-tac-toe starting from the current position?

Representing Graphs

In a tree, all nodes can be reached from the root node, so an entire tree can be represented by storing a single pointer to the root node. Some graphs have a similar property; i.e., there is a special "root" node from which all other nodes are reachable (control-flow graphs often have this property). In that case, a graph can also be represented using a single pointer to the root node. However, if there is no root node, then we must use an array or a linked list of (pointers to) graph nodes to ensure access to every node (i.e., the nodes are numbered from 0 to N-1, where N is the number of nodes in the graph, and the kth value in the array or in the list is a pointer to node k).

An advantage of using an array of pointers to nodes is that, given a node's number, you can access the node in constant time (because subscripting in an array is a constant-time operation). If you use a linked list, accessing node n takes time O(n). On the other hand, an advantage of using a linked list is that it makes it easy to add a new node (just add an item to the front of the list). If an array is used, then it will have to be expanded (doubled in size) when it becomes full.

The nodes will contain whatever data is stored in a node (e.g., the name of a city, the name of a CS class, the statement represented by a control-flow graph node). The nodes can also contain pointers to their successors, or that information can be stored in a separate array. These two approaches to storing edge information are discussed below.

Adjacency lists

The usual way to represent edges is to let each node contain pointers to its successors (for directed graphs) or neighbors (for undirected graphs). These pointers are sometimes called adjacency lists. The pointers could be stored in an array, or in a linked list, or in a Sequence (which itself could be implemented using either an array or a linked list). Assuming that a Sequence is used, the following class definitions can be used to implement directed graphs:

class GraphNode {
    // fields
    Object data;
    Sequence successors;

    // methods
    ...
}

class Graph {
    // fields
    GraphNode[] nodes;  // array of ptrs to nodes
    int numNodes;       // number of nodes in the graph

    // methods
    ...
}

Here's a picture of a graph, and of the Graph object that represents it:

Adjacency matrix

Another way to store information about the edges in a graph is to use an auxiliary array called an adjacency matrix. Instead of storing pointers to successors in the nodes themselves, edge information is stored in an N x N array of booleans in which entry [j][k] == true iff there is an edge from j to k. For example, here's the example graph used above, and its adjacency matrix:

Note that if there is an edge from node k to itself, entry [k][k] will be true.

If the graph is undirected, then the matrix is symmetric (because the edge between nodes j and k conceptually runs in both directions). For example:

TEST YOURSELF #2

Question 1. Draw the adjacency matrix for the following (unconnected) graph.

Question 2. Suppose we have a weighted graph (one in which each edge has an associated value). How could the edge weights be stored, using each of the two techniques for storing edge information discussed above (using adjacency lists, and using an adjacency matrix)?

solution

Comparison: Adjacency list vs adjacency matrix

To compare the two ways of representing edges, we will consider both the amount of space used, and the time required for some standard operations.

Regardless of which way edges are represented, O(N) space will be needed to store information about the nodes (the space for the node objects themselves, plus the array or linked list of pointers to the nodes). In addition, an adjacency matrix takes O(N²) space for a graph with N nodes, regardless of how many edges are in the graph. Assuming that adjacency lists are implemented using Sequences, the space used to store information about edges is proportional to N + E (where E is the number of edges). The N comes from the fact that every node has a Sequence (so some space is taken up even if the node has no successors); the E comes from the fact that the space required for a Sequence of k items is O(k).

If the graph is sparse (there are not many edges), then adjacency lists will probably be more space efficient than adjacency matrices; if the graph is dense (the number of edges is O(N²)), then the adjacency matrix will probably be more space efficient.

The following table provides information about time requirements for some simple graph operations. N is the number of nodes in the graph. You should think about each operation and be sure that you understand why its time requirements are as specified in the table.

Operation
Adjacency Matrix
Adjacency List

add edge
O(1)
O(1)

remove edge from node j to node k
O(1)
worst case: O(# successors of j), which can be as bad as O(N)

is there an edge from node j to node k?
O(1)
worst case: O(# successors of j), which can be as bad as O(N)

list all successors of a node j
O(N)
O(# successors of j), which can be as bad as O(N)

initialize
O(N²)
O(N)

Accessing nodes by name

Our assumption has been that nodes are identified by numbers in the range 0 to N-1. What if we really want to identify nodes some other way; e.g., by name? (For example, if we're using a graph to represent cities and airline flights, we'd want to identify the nodes using the city names.)

The solution is to use a Dictionary (e.g., a search tree or a hashtable) to map node names to numbers. That is, the key would be the name, and the associated information would be the corresponding number. For example, suppose that we are using an array named nodes to store pointers to nodes, and there are already 3 nodes (numbered 0-2) in the graph. Here's a picture of the way the graph is represented, using a binary-search tree for the dictionary, and using adjacency lists for edges:

The operation addNode( "Madison" ) would do the following:

Add the entry ("Madison", 3) to the dictionary.
Create a new node to represent Madison.
Set nodes[3] to point to the new node.
Increment the current number of nodes.

Here's what the graph looks like after "Madison" has been added:

The operation AddEdge( "Madison", "Milwaukee") would first look up the node numbers for Madison and Milwaukee, finding numbers 3 and 1, respectively, and then would add a pointer to the node representing "Milwaukee" -- the node pointed to by nodes[1] -- to the successor list in the node representing "Madison" -- the node pointed to by nodes[3]. If an adjacency matrix had been used instead of adjacency lists, entry[3][1] in the adjacency matrix would be set to true to represent the edge from Madison to Milwaukee.

Operation	Adjacency Matrix	Adjacency List
add edge	O(1)	O(1)
remove edge from node j to node k	O(1)	worst case: O(# successors of j), which can be as bad as O(N)
is there an edge from node j to node k?	O(1)	worst case: O(# successors of j), which can be as bad as O(N)
list all successors of a node j	O(N)	O(# successors of j), which can be as bad as O(N)
initialize	O(N²)	O(N)

Contents