Kruskal's Algorithm
This page goes over Kruskal's Algorithm, a way to find the Minimum Spanning Tree of certain graphs.
Below is the pseudocode for the algorithm and an example graph to facilitate understanding.
Definitions:
-
In Computer Science, a graph is a data structure used to represent relationships between objects.
It consists of a set of connected nodes (or vertices, #vertices = V) and a set of edges (#edges = E).
Nodes can represent individual objects, like cities in a map or computers in a network. Edges represent
connections between nodes, like roads between cities, or connections between computers.
-
In an undirected graph, its edges have no direction, so there is only one edge connecting each pair of
nodes, and the connectivity is mutual. In a directed graph, its edges do have directions, so every edge has
a specific direction of connection. For example, in a directed graph, an edge that goes from node A to node B
connects A to B only, and does not permit B going to A. We'll need another edge from B to A if we want
to go from B to A.
-
Sometimes, every edge is associated with a number. We call that number the edge weight.
-
A connected graph is a graph where there is a path between every pair of nodes. In other words, no node
is isolated, and we can traverse from any node to any other node by following edges. If a directed graph
is connected, it is called strongly connected. If, ignoring edge directions, a directed graph is connected,
it is called weakly connected. An undirected graph does not have this distinction because its having a
path between every pair of nodes implies traversibility.
-
For an undirected graph G with V nodes, if T is a connected subgraph of G containing all nodes of G and
has exactly (V-1) edges, then T is a spanning tree of G.
-
Minimum Spanning Trees (MSTs) are spanning trees with the lowest sum of edge weights. This is what Kruskal's
algorithm finds.
Steps of Algorithm (as taught in class):
-
The method expects a list of edges in the graph with their edge weights.
-
The algorithm undergoes a sorting algorithm that sorts the edges by their weights from smallest to largest.
-
Initialize a helper data structure where each node is placed in a set, so every set contains exactly
one node (a singleton set), and the sets do not overlap. Such sets are said to be disjoint. They help determine
if two nodes are connected, in which case they will belong to the same set. In the example graph, there
would be 9 singleton sets up to this step of the algorithm.
-
Then, we process edges one by one using a while loop, until all edges are exhausted.
-
For every loop, we pull off the next edge from the sorted list, which is the edge with the smallest weight
so far. The edges will need to have data fields like the start node and end node.
-
If the start and end nodes are in the same set, we know they are already connected in the spanning tree
due to previous operations through some intermediate edges. For example, A and B are in the same set because
A to E to B, which has the smallest sum of edge weights, was selected previously.
-
If the start and end nodes are not yet in the same set, we will include the edge in our spanning tree,
as it is guaranteed that this selection will constitute the spanning tree with the lowest sum of weights.
Right after the addition of the edge, union the two sets that used to be disjoint in the helper data structure
to show that the nodes are now connected and of the same set.
Complexity: O(Elog(V))
Additional Resources: