We can simply record this forest of trees with an array p of size N items where p[i] records the parent of item i and if p[i] = i, then i is the root of this tree and also the representative item of the set that contains item i.
Once again, look at the visualization above and determine the values inside this array p.
Discuss: If i is the root of the tree that contains it, can we set p[i] = -1 instead of p[i] = i? What are the implications?
Initialize(N, M): Create N items and form M disjoint sets with these N items. We randomly pick two disjoint sets and merge them until we have M random disjoint sets. Currently this setup is not random enough, i.e., it cannot create tall trees for example.
The default form is Initialize(N, N), i.e., M = N, all with p[i] = i and rank[i] = 0 (all these rank values are initially not shown). The time complexity of this operation is clearly O(N).
Due to the limitation of screen size, we set 1 ≤ N ≤ 16.
FindSet(i):从顶点i,递归地在树上往上移动。 也就是说,从顶点i,我们转到顶点p [i]),直到我们找到该树的根,这是该不相交集的的代表项(代表项具有p [i] = i的性质)。
在这个FindSet(i)操作中,我们在每次调用FindSet(i)之后使用路径压缩,因为现在沿着从顶点i到根的路径的每个顶点都知道根是它们的代表项,并且可以用O(1)时间直接指向它 。
If we execute FindSet(12), we will immediately get vertex 12.
If we execute FindSet(9) we will get vertex 6 after 1 step and no other change.
Now try executing
. If this is your first call on this default UFDS example, it will return vertex 3 after 2 steps and then modify the underlying UFDS structure due to path-compression in action (that is, vertex 0 points to vertex 3 directly). Notice that rank value of rank[1] = 1 is now wrong as vertex 1 becomes a new leaf. However, we will not bother to update its value.Notice that the next time you execute
again, it will be (much) faster as the path has been compressed. For now, we assume that FindSet(i) runs in O(1).UnionSet(i, j): If item i and j come from two disjoint sets initially, we link the representative item of the shorter tree/disjoint set to the representative item of the taller tree/disjoint set (otherwise, we do nothing). This is also done in O(1).
This is union-by-rank heuristic in action and will cause the resulting tree to be relatively short. Only if the two trees are equally tall before union (by comparing their rank values heuristically — note that we are not comparing their actual — the current — heights), then the rank of the resulting tree will increase by one unit.
On the same default example, try
. As the tree that represents disjoint set {6, 9} is currently taller (according to the value of rank[6] = 1), then the shorter tree that represents disjoint set {12} will be slotted under vertex 6, without increasing the height of the combined tree at all.On the same default example, try
. Notice that the ranks of vertex 3 and vertex 5 are the same rank[3] = rank[5] = 2. Thus, we can either put vertex 3 under vertex 5 (our implementation) or vertex 5 under vertex 3 (both will increase the resulting height of the combined tree by 1). Notice the indirect path-compression heuristic in action.Quiz: Starting with N=8 disjoint sets, how tall (heuristically) can the resulting final tree if we call 7 UnionSet(i, j) operations strategically?
Quiz: Starting with N=8 disjoint sets, how short (heuristically) can the resulting final tree if we call 7 UnionSet(i, j) operations strategically?
讨论:为什么?
So far, we say that FindSet(i), IsSameSet(i, j), and UnionSet(i, j) runs in O(1). Actually they run in O(α(N)) if the UFDS is implemented with both path-compression and union-by-rank heuristics. The analysis is quite involved and is skipped in this visualization.
This α(N) is called the inverse Ackermann function that grows extremely slowly. For practical usage of this UFDS data structure (assuming N ≤ 1M), we have α(1M) ≈ 1.