Kruskal's Algorithm

Contents

Introduction

Union-Find

Kruskal's Algorithm


Introduction

In Minimum-Spanning-Tree (MST) problems, we usually care to find a way to span/cover all nodes in a graph, while minimizing the total weight/cost of the edges. One intuitive and neat way to accomplish this is through Kruskal's MST Algorithm.

The idea is that we sort all edges that we could possibly use in the graph, ordering them by their weight. Then, we go from the cheapest to most expensive edge, processing them as follows. Let the (undirected) edge be (a,b) where a and b are its vertices.

  1. Find the "set" that a belongs to, call it A
  2. Find the "set" that b belongs to, call it B
  3. If A = B, ignore and continue
  4. Otherwise, use this edge and unite A and B
When we say "set", we usually mean the connected component that the vertex lies in. Using an edge and uniting these sets, implies that these two connected components become a single connected component.

Now we need a way to define these sets, their unions, and finding any vertex's set. The supplementary data-structure we will use to do this is called a Union-Find data structure. Java's TreeSet is not efficient enough to support these operations, so we will define our own.


Union-Find

Credit to Steven, Felix Halim and their book Competitive Programming 4:
Github Repo
        
// Union-Find Disjoint Sets Library written in OOP manner, 
// using both path compression and union by rank heuristics
class UnionFind 
{   
  private ArrayList p, rank, setSize;
  private int numSets;

  public UnionFind(int N) 
  {
    p = new ArrayList<>(N);
    rank = new ArrayList<>(N);
    setSize = new ArrayList<>(N);
    numSets = N;
    for (int i = 0; i < N; i++) 
    {
      p.add(i);
      rank.add(0);
      setSize.add(1);
    }
  }

  public int findSet(int i) 
  {
    if (p.get(i) == i) return i;
    else 
    {
      int ret = findSet(p.get(i)); p.set(i, ret);
      return ret; 
    } 
  }

  public Boolean isSameSet(int i, int j) 
  {
    return findSet(i) == findSet(j); 
  }

  public void unionSet(int i, int j) { 
    if (!isSameSet(i, j)) 
    { 
      numSets--; 
      int x = findSet(i), y = findSet(j);
      // rank is used to keep the tree short
      if (rank.get(x) > rank.get(y)) 
      { 
        p.set(y, x); setSize.set(x, setSize.get(x) + setSize.get(y)); 
      } 
      else 
      { 
        p.set(x, y); setSize.set(y, setSize.get(y) + setSize.get(x));
        if (rank.get(x) == rank.get(y)) 
          rank.set(y, rank.get(y) + 1); 
      } 
    } 
  }
  public int numDisjointSets() { return numSets; }
  public int sizeOfSet(int i) { return setSize.get(findSet(i)); }
}
        
        

Kruskal's Algorithm

Now that we have defined the supplementary Union-Find data structure, we can implement Kruskal's Algorithm, according to the outlined logic from before. We illustrate this with an example in Java (credit to Halim and Halim):
Github Repo
        
public class kruskal 
{
  public static void main(String[] args) throws Exception 
  {
    File f = new File("mst_in.txt");
    Scanner sc = new Scanner(f);

    // Kruskal's algorithm
    int V = sc.nextInt(), E = sc.nextInt();
    ArrayList EL = new ArrayList<>();
    for (int i = 0; i < E; ++i) 
    {
      // u and v are the vertices, w is the weight
      int u = sc.nextInt(), v = sc.nextInt(), w = sc.nextInt();

      // reorder as (w, u, v), we want to order by weight
      EL.add(new IntegerTriple(w, u, v));        
    }
    
    // sort by w, O(E log E)
    Collections.sort(EL);

    int mst_cost = 0, num_taken = 0;    // no edge has been taken
    UnionFind UF = new UnionFind(V);    // all V are disjoint sets
    for (int i = 0; i < E; ++i) 
    {   // up to O(E)
      IntegerTriple front = EL.get(i);

      // check
      if (UF.isSameSet(front.second(), front.third())) continue; 

      mst_cost += front.first();                 // add w of this edge
      UF.unionSet(front.second(), front.third());// link them
      ++num_taken;                               // 1 more edge is taken
      if (num_taken == V-1) break;               // tree is spanning
    }

    // note: the number of disjoint sets must eventually be 1 for a valid MST
    System.out.printf("MST cost = %d (Kruskal's)\n", mst_cost);
  }
}