Home / Coding / Union-Find

Union-Find

Union-Find (also called Disjoint Set Union) tracks which nodes belong to the same connected component, supporting merge and membership queries in near-constant time using two optimizations.

  • Replaces repeated BFS/DFS connectivity checks with O(α(n)) amortized operations — effectively O(1) — using path compression and union by rank
  • Two operations: find (returns the root representative of a node's component) and union (merges two components, returns false if they were already connected)
  • Reach for it when the problem involves counting connected components, detecting cycles in undirected graphs, or dynamically connecting nodes as edges are added

Quick Revision - Must Do Problems

If short on time, solve only these. They cover ~90% of Union-Find patterns.

# Problem Pattern Why It's Essential Time to Solve
1 Number of Connected Components Count Basic union-find — count successful unions 15 min
2 Redundant Connection Cycle Detection Return the edge whose union() returns false 20 min
3 Number of Provinces Count Same pattern, adj matrix input 15 min

Practice Tips

How to Identify a Union-Find Problem

  • "How many connected groups / components?"
  • "Is there a cycle in this undirected graph?"
  • "Are nodes X and Y in the same group?"
  • "Add edges one at a time — after each addition, how many components remain?"
  • BFS/DFS also works for static graphs; reach for Union-Find when edges are added dynamically or when you need repeated connectivity queries

Union-Find vs. Other Techniques

If you see... Use... Not...
Connected components (static graph) Union-Find or DFS Union-Find not required, but cleaner
Cycle detection in undirected graph Union-Find DFS with visited[] (either works)
Cycle detection in directed graph DFS with 3 states Union-Find (does NOT work for directed!)
Edges added dynamically, query connectivity Union-Find Re-running BFS/DFS each time

Interview Approach

  1. Initialize parent[i] = i and rank[i] = 0 for all nodes
  2. find(x): follow parent pointers to the root; apply path compression — parent[x] = find(parent[x])
  3. union(x, y): find both roots; if same root, they're already connected (return false / cycle detected); otherwise attach smaller-rank tree under larger-rank
  4. For component count: start with n components and decrement by 1 on every successful union
  5. For cycle detection: return the edge where union() returns false (the two nodes were already connected)

1 Main Pattern

Union-Find with Path Compression + Union by Rank

int[] parent, rank;

void init(int n) {
    parent = new int[n];
    rank   = new int[n];
    for (int i = 0; i < n; i++) {
        parent[i] = i;  // every node is its own root
    }
}

int find(int x) {
    if (parent[x] != x) {
        parent[x] = find(parent[x]); // path compression
    }
    return parent[x];
}

boolean union(int x, int y) {
    int rx = find(x), ry = find(y);
    if (rx == ry) {
        return false;                   // already connected
    }
    if (rank[rx] < rank[ry]) { int t = rx; rx = ry; ry = t; }  // rx = larger rank
    parent[ry] = rx;
    if (rank[rx] == rank[ry]) {
        rank[rx]++;
    }
    return true;                                  // successfully merged
}

General Templates

Component Count Template

// Start with n components; decrement on every successful union
int components = n;
UnionFind uf = new UnionFind(n);
for (int[] edge : edges) {
    if (uf.union(edge[0], edge[1])) components--;
}

Cycle Detection Template

// The first edge where union() returns false creates a cycle
for (int[] edge : edges) {
    if (!uf.union(edge[0], edge[1])) return edge;  // this edge is redundant
}

Problem 1: Number of Connected Components in Undirected Graph ⭐ MUST DO

Difficulty Time to Solve Pattern
Medium 15 min Union-Find (Component Count)

Given n nodes and a list of edges, return the number of connected components.

Example:

Input:  n = 5,  edges = [[0,1],[1,2],[3,4]]  →  2
Input:  n = 5,  edges = [[0,1],[1,2],[2,3],[3,4]]  →  1

Key Insight:

  • Initialize each node as its own component (count = n). For every edge, attempt a union. A successful union means two previously separate components merged — decrement the count. After all edges, the count is the number of remaining components.

Solution:

public int countComponents(int n, int[][] edges) {
    int[] parent = new int[n], rank = new int[n];
    for (int i = 0; i < n; i++) {
        parent[i] = i;
    }

    int components = n;
    for (int[] e : edges) {
        if (union(parent, rank, e[0], e[1])) components--;
    }

    return components;
}

private int find(int[] parent, int x) {
    if (parent[x] != x) {
        parent[x] = find(parent, parent[x]);
    }
    return parent[x];
}

private boolean union(int[] parent, int[] rank, int x, int y) {
    int rx = find(parent, x), ry = find(parent, y);
    if (rx == ry) {
        return false;
    }
    if (rank[rx] < rank[ry]) {
        int t = rx; rx = ry; ry = t;
    }
    parent[ry] = rx;
    if (rank[rx] == rank[ry]) {
        rank[rx]++;
    }
    return true;
}

Complexity:

Time O(E × α(n)) ≈ O(E) — α(n) is the inverse Ackermann function, effectively constant
Space O(n) — parent and rank arrays

Problem 2: Redundant Connection ⭐ MUST DO

Difficulty Time to Solve Pattern
Medium 20 min Union-Find (Cycle Detection)

Given a tree with one extra edge added (making one cycle), return that redundant edge. If multiple answers exist, return the one that appears last.

Example:

Input:  [[1,2],[1,3],[2,3]]  →  [2,3]
Input:  [[1,2],[2,3],[3,4],[1,4],[1,5]]  →  [1,4]

Key Insight:

  • Process edges one by one. A tree with n nodes has exactly n-1 edges — adding one creates exactly one cycle. The first edge whose two endpoints are already in the same component (union returns false) is the redundant one. Since we want the last such edge if there are ties, processing in order and returning the last failure handles it naturally.

Solution:

public int[] findRedundantConnection(int[][] edges) {
    int n = edges.length;
    int[] parent = new int[n + 1], rank = new int[n + 1];
    for (int i = 0; i <= n; i++) {
        parent[i] = i;
    }

    for (int[] e : edges) {
        if (!union(parent, rank, e[0], e[1])) return e;
    }

    return new int[]{};
}

private int find(int[] parent, int x) {
    if (parent[x] != x) {
        parent[x] = find(parent, parent[x]);
    }
    return parent[x];
}

private boolean union(int[] parent, int[] rank, int x, int y) {
    int rx = find(parent, x), ry = find(parent, y);
    if (rx == ry) {
        return false;
    }
    if (rank[rx] < rank[ry]) {
        int t = rx; rx = ry; ry = t;
    }
    parent[ry] = rx;
    if (rank[rx] == rank[ry]) {
        rank[rx]++;
    }
    return true;
}

Complexity:

Time O(E × α(n)) ≈ O(E)
Space O(n)

Problem 3: Number of Provinces ⭐ MUST DO

Difficulty Time to Solve Pattern
Medium 15 min Union-Find (Component Count)

Given an n×n adjacency matrix where isConnected[i][j] = 1 means cities i and j are directly connected, return the number of provinces (groups of directly or indirectly connected cities).

Example:

Input:  [[1,1,0],[1,1,0],[0,0,1]]  →  2
Input:  [[1,0,0],[0,1,0],[0,0,1]]  →  3

Key Insight:

  • Same connected-components count as Problem 1, but the graph is given as an adjacency matrix rather than an edge list. Iterate only the upper triangle (j > i) to avoid processing each edge twice. The union-find logic is identical.

Solution:

public int findCircleNum(int[][] isConnected) {
    int n = isConnected.length;
    int[] parent = new int[n], rank = new int[n];
    for (int i = 0; i < n; i++) {
        parent[i] = i;
    }

    int components = n;
    for (int i = 0; i < n; i++) {
        for (int j = i + 1; j < n; j++)
    }
            if (isConnected[i][j] == 1 && union(parent, rank, i, j)) {
                components--;
            }

    return components;
}

private int find(int[] parent, int x) {
    if (parent[x] != x) {
        parent[x] = find(parent, parent[x]);
    }
    return parent[x];
}

private boolean union(int[] parent, int[] rank, int x, int y) {
    int rx = find(parent, x), ry = find(parent, y);
    if (rx == ry) {
        return false;
    }
    if (rank[rx] < rank[ry]) {
        int t = rx; rx = ry; ry = t;
    }
    parent[ry] = rx;
    if (rank[rx] == rank[ry]) {
        rank[rx]++;
    }
    return true;
}

Complexity:

Time O(n²) — must scan all matrix cells
Space O(n)

Quick Reference Table

# Problem Pattern Key Technique Time
1 Number of Connected Components ⭐ Component Count Start n; decrement on successful union O(E)
2 Redundant Connection ⭐ Cycle Detection Return edge where union() returns false O(E)
3 Number of Provinces ⭐ Component Count Adj matrix — upper triangle only, same logic O(n²)

When to Use Each Pattern

Component Count

  • Need the total number of isolated groups after all edges are added
  • Initialize count = n; decrement once per successful union() call
  • Works for both edge-list input and adjacency-matrix input

Cycle Detection

  • Process edges in order; the first edge where union() returns false creates a cycle
  • Only valid for undirected graphs — directed cycle detection requires DFS with 3-state visited array
  • When the problem asks to return the redundant edge, process in order (last qualifying edge is returned naturally)

Common Mistakes to Avoid

Structural Mistakes

  1. Using Union-Find on directed graphsunion(x, y) merges components symmetrically; it cannot model directed edges. Use DFS with 3-state visited array for directed cycle detection (Course Schedule pattern)
  2. Comparing raw node values instead of roots — always call find() on both nodes before comparing; comparing x == y instead of find(x) == find(y) gives wrong results when nodes are in the same component but have different parents

Initialization Mistakes

  1. Missing node 0 or off-by-one — if nodes are 1-indexed (common in LeetCode), allocate parent[n+1] and initialize parent[i] = i for i from 0 to n
  2. Forgetting rank initializationrank[] starts at 0 for all nodes; forgetting to initialize it leaves garbage values that corrupt union-by-rank

Optimization Mistakes

  1. Omitting path compression — without parent[x] = find(parent[x]), the tree can degenerate to a chain and find() becomes O(n) per call; path compression is what makes it near-O(1)