GVSU CIS 263
Week 9 / Day 2
Union of Disjoint sets
- Imagine creating a random maze
- Begin with a
m x n
grid of boxes with four walls.
- Assign each box a unique “set”.
- Pick a wall at random.
- If the two rooms are in separate sets (i.e., no path between them yet), merge the two sets
into one.
- Repeat.
- This same process can be used for clustering.
- Put each object / person in his own set.
- Find two sets to merge.
- Repeat.
- Note: Two key operations:
find:
get the name of the set an an object is in.
union:
merge two sets.
- Try to come up with the most efficient find/merge algorithm.
- “find fast” or “merge fast”?
- What is the cost of merging in “find fast” setup where we have a map of node names to set names?
O(n)
per merge, since we need to scan the list for all nodes belonging to the merged set.
- How can we merge fast? Use a tree
- What is the cost of using a simple tree? find can get slow if tree is unbalanced.
- How to reduce the cost of a
find
(i.e., depth of the tree)?
- union-by-size: Always merge smaller tree into larger.
- How does this minimize tree depth?
- Tree can’t be deeper than
log n
, Why?
- Whenever a node’s depth increases, the size of its tree must double. Can only double
log n
times.
- What does the tree look like in the best case? Depth of at most two. Lots of single nodes
pointing to a root. This gives an average constant time per operation.
- union-by-height works for the same reason.
- Can we do better (i.e., knock down the height of the tree)?
- Path compression: Every find moves every node along the path up to the root.
M
operations can be completed in O(Mα) where α is an extrodinarily slow-growing function –
effectively constatnt.
// both node numbers and set numbers are integers.
// An array S maps nodes to their parent node.
// Root nodes have a negative value.
int find(int x) {
// x is a root. Return the set name.
if (S[x] < 0) {
return x;
} else {
S[x] = find(S[x]);
return S[x];
}
}