GVSU CIS 263

Week 9 / Day 2

Union of Disjoint sets

Imagine creating a random maze
- Begin with a m x n grid of boxes with four walls.
- Assign each box a unique “set”.
- Pick a wall at random.
- If the two rooms are in separate sets (i.e., no path between them yet), merge the two sets into one.
- Repeat.
This same process can be used for clustering.
- Put each object / person in his own set.
- Find two sets to merge.
- Repeat.
Note: Two key operations:
- find: get the name of the set an an object is in.
- union: merge two sets.
Try to come up with the most efficient find/merge algorithm.
- “find fast” or “merge fast”?
- What is the cost of merging in “find fast” setup where we have a map of node names to set names?
  - O(n) per merge, since we need to scan the list for all nodes belonging to the merged set.
- How can we merge fast? Use a tree
- What is the cost of using a simple tree? find can get slow if tree is unbalanced.
- How to reduce the cost of a find (i.e., depth of the tree)?
  - union-by-size: Always merge smaller tree into larger.
  - How does this minimize tree depth?
  - Tree can’t be deeper than log n, Why?
    - Whenever a node’s depth increases, the size of its tree must double. Can only double log n times.
  - What does the tree look like in the best case? Depth of at most two. Lots of single nodes pointing to a root. This gives an average constant time per operation.
  - union-by-height works for the same reason.
- Can we do better (i.e., knock down the height of the tree)?
  - Path compression: Every find moves every node along the path up to the root.
  - M operations can be completed in O(Mα) where α is an extrodinarily slow-growing function – effectively constatnt.

// both node numbers and set numbers are integers.
// An array S maps nodes to their parent node.
// Root nodes have a negative value.
int  find(int x) {

    // x is a root.  Return the set name.
    if (S[x] < 0) {
      return x;
    } else {        
      S[x] = find(S[x]);
      return S[x];
    }
}