The Union-Find Disjoint Sets (UFDS) data structure is used to model a collection of disjoint sets, which is able to efficiently (i.e., in nearly constant time) determine which set an item belongs to, test if two items belong to the same set, and union two disjoint sets into one when needed. It can be used to find connected components in an undirected graph, and can hence be used as part of Kruskal's algorithm for the Minimum Spanning Tree (MST) problem.
Note that this data structure has another alternative name: Disjoint Sets Union (DSU).
Remarks: By default, we show e-Lecture Mode for first time (or non logged-in) visitor.
If you are an NUS student and a repeat visitor, please login.
Lihatlah visualisasi dari contoh UFDS (Himpunan Lepas) di sini!
Setiap pohon melambangkan sebuah himpunan lepas (maka sebuah kumpulan dari himpunan-himpunan lepas tersebut membentuk sebuah hutan) dan akar (root) dari setiap pohon adalah item representatif dari himpunan lepas ini.
Sekarang berhenti dan lihatlah pohon-pohon yang sekarang sedang divisualisasikan. Ada berapa jumlah item-item disana? Berapa jumlah himpunan lepas di sana? Siapa saja anggota dari setiap himpunan lepas tersebut? Apakah item representasi dari setiap himpunan lepas yang ada?
Pro-tip 1: Since you are not logged-in, you may be a first time visitor (or not an NUS student) who are not aware of the following keyboard shortcuts to navigate this e-Lecture mode: [PageDown]/[PageUp] to go to the next/previous slide, respectively, (and if the drop-down box is highlighted, you can also use [→ or ↓/← or ↑] to do the same),and [Esc] to toggle between this e-Lecture mode and exploration mode.
Pro-tip 2: We designed this visualization and this e-Lecture mode to look good on 1366x768 resolution or larger (typical modern laptop resolution in 2021). We recommend using Google Chrome to access VisuAlgo. Go to full screen mode (F11) to enjoy this setup. However, you can use zoom-in (Ctrl +) or zoom-out (Ctrl -) to calibrate this.
We can simply record this forest of trees with an array p of size N items where p[i] records the parent of item i and if p[i] = i, then i is the root of this tree and also the representative item of the set that contains item i.
Once again, look at the visualization above and determine the values inside this array p.
Discuss: If i is the root of the tree that contains it, can we set p[i] = -1 instead of p[i] = i? What are the implications?
Pro-tip 3: Other than using the typical media UI at the bottom of the page, you can also control the animation playback using keyboard shortcuts (in Exploration Mode): Spacebar to play/pause/replay the animation, ←/→ to step the animation backwards/forwards, respectively, and -/+ to decrease/increase the animation speed, respectively.
The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.
If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.
FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.
Pada contoh tetap yang sama, jawaban-jawaban anda harusnya p = [1, 3, 3, 3, 3, 5, 6, 5, 5, 6, 4, 8,12] dengan ukuran N = 13 untuk p[0] sampai p[12].
Anda dapat mengecek bahwa p[3] = 3, p[5] = 5, p[6] = 6, dan p[12] = 12, yang adalah konsisten dengan fakta bahwa {3,5,6,12} adalah item-item representatif (dari himpunan lepas mereka sendiri).
Kita juga menyimpan satu lagi informasi di dalam larik rank juga dengan ukuran N. Nilai dari rank[i] adalah batas-atas dari tinggi sub-pohon yang berakar pada simpul i yang akan digunakan sebagai heuristik pembimbing untuk operasi UnionSet(i, j). Anda akan menyadari nanti bahwa setelah heuristik 'kompresi-jalur' (akan dijelaskan segera) mengkompres sebuah jalur, nilai-nilai peringkat tidak lagi merefleksikan tinggi sesungguhnya dari sub-pohon tersebut.
Karena banyak item-item dengan peringkat 0, kami mengatur visualisasi sebagai berikut untuk mengurangi kekacauan: Hanya jika peringkat dari sebuah simpul i lebih besar dari 0, maka VisuAlgo akan menunjukkan nilai dari rank[i] (disingkat sebagai satu karakter r) sebagai teks berwarna merah dibawah simpul i.
Pada contoh tetap yang sama, verifikasi bahwa {1,4,6,8} memiliki peringkat 1 dan {3,5} memiliki peringkat 2, dan yang lainnya memiliki peringkat 0 (tidak ditunjukkan).
Pada saat ini, semua nilai-nilai peringkat adalah benar, yaitu mereka benar-benar mendeskripsikan tinggi dari sub-pohon yang berakar pada simpul tersebut. Kita akan segera melihat bahwa mereka tidak akan selalu benar di beberapa slide-slide berikutnya.
Terdapat lima operasi-operasi UFDS (Himpunan Lepas) dalam halaman visualisasi ini:
Contoh-Contoh, Inisialisasi(N), FindSet(i), IsSameSet(i, j), dan UnionSet(i, j).
Operasi pertama (Contoh-Contoh) adalah sederhana: Berikan daftar struktur-struktur Himpunan Lepas dengan berbagai karakteristik-karakteristik untuk titik permulaan anda. Mode kuliah maya ini selalu menggunakan contoh 'Empat Himpunan Lepas' sebagai titik permulaan.
Juga sadari bahwa tidak ada satupun dari contoh-contoh yang memiliki 'pohon yang tinggi'. Anda akan segera mengerti alasannya setelah kami menjelaskan dua heuristik-heuristik yang dipakai.
Initialize(N, M): Create N items and form M disjoint sets with these N items. We randomly pick two disjoint sets and merge them until we have M random disjoint sets. Due to the union-by-rank heuristics used and the randomness, it is very unlikely that this initialization process creates a tall tree.
The default form is Initialize(N, N), i.e., M = N, all with p[i] = i and rank[i] = 0 (all these rank values are initially not shown). The time complexity of this operation is clearly O(N).
Due to the limitation of screen size, we set 1 ≤ N ≤ 32. Obviously M ≤ N.
FindSet(i): Dari simpul i, pergi ke arah atas di dalam pohon secara rekursif. Yaitu, dari simpul i, kita pergi ke simpul p[i]) hingga kita sampai pada akar dari pohon tersebut, yang adalah item representasi dengan p[i] = i dari himpunan lepas ini.
Dalam operasi FindSet(i), kami menggunakan heuristik kompresi-jalur setelah setiap panggilan kepada FindSet(i) karena sekarang setiap simpul yang terdapat dalam jalur dari simpul i ke akar dari pohon ini mengetahui bahwa akar tersebut adalah item representatif mereka dan dapat langsung menunjuk kepada akar tersebut secara langsung dalam O(1).
If we execute FindSet(12), we will immediately get vertex 12.
If we execute FindSet(9) we will get vertex 6 after 1 step and no other change.
Now try executing
. If this is your first call on this default UFDS example, it will return vertex 3 after 2 steps and then modify the underlying UFDS structure due to path-compression in action (that is, vertex 0 points to vertex 3 directly). Notice that rank value of rank[1] = 1 is now wrong as vertex 1 becomes a new leaf. However, we will not bother to update its value.Notice that the next time you execute
again, it will be (much) faster as the path has been compressed. For now, we assume that FindSet(i) runs in O(1).IsSameSet(i, j): Cek saja apakah FindSet(i) == FindSet(j). Fungsi ini digunakan secara ektensif pada algoritma MST Kruskal. Karena fungsi ini hanya memanggil operasi FindSet dua kali, kita akan mengasumsikan bahwa fungsi ini juga berjalan dalam O(1).
Perlu diingat bahwa fungsi FindSet dipanggil di dalam fungsi isSameSet, maka heuristik kompresi-jalur juga digunakan secara tidak langsung.
Jika kita memanggil IsSameSet(3, 5), kita harusnya mendapatkan false karena simpul 3 dan simpul 5 adalah item-item representatif dari himpunan-himpunan lepas mereka dan mereka berbeda.
Sekarang cobalah
padah contoh default yang sama untuk melihat kompresi-jalur secara tidak langsung pada simpul 0 dan simpul 11. Kita harusnya mendapatkan false karena dua item-item representatif: simpul 3 dan simpul 5, adalah berbeda. Sadari bahwa nilai-nilai peringkat pada simpul {1, 5, 8} sekarang semuanya salah. Tetapi kita tidak akan memperbaikinya.UnionSet(i, j): If item i and j come from two disjoint sets initially, we link the representative item of the shorter tree/disjoint set to the representative item of the taller tree/disjoint set (otherwise, we do nothing). This is also done in O(1).
This is union-by-rank heuristic in action and will cause the resulting tree to be relatively short. Only if the two trees are equally tall before union (by comparing their rank values heuristically — note that we are not comparing their actual — the current — heights), then the rank of the resulting tree will increase by one unit.
On the same default example, try
. As the tree that represents disjoint set {6, 9} is currently taller (according to the value of rank[6] = 1), then the shorter tree that represents disjoint set {12} will be slotted under vertex 6, without increasing the height of the combined tree at all.On the same default example, try
. Notice that the ranks of vertex 3 and vertex 5 are the same rank[3] = rank[5] = 2. Thus, we can either put vertex 3 under vertex 5 (our implementation) or vertex 5 under vertex 3 (both will increase the resulting height of the combined tree by 1). Notice the indirect path-compression heuristic in action.Quiz: Starting with N=8 disjoint sets, how tall (heuristically) can the resulting final tree if we call 7 UnionSet(i, j) operations strategically?
Quiz: Starting with N=8 disjoint sets, how short (heuristically) can the resulting final tree if we call 7 UnionSet(i, j) operations strategically?
Diskusi: Kenapa?
The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.
If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.
FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.
So far, we say that FindSet(i), IsSameSet(i, j), and UnionSet(i, j) runs in O(1). Actually they run in O(α(N)) if the UFDS is implemented with both path-compression and union-by-rank heuristics. The analysis is quite involved and is skipped in this visualization.
This α(N) is called the inverse Ackermann function that grows extremely slowly. For practical usage of this UFDS data structure (assuming N ≤ 1M), we have α(1M) ≈ 1.
For a few more interesting questions about this data structure, please practice on Union-Find Disjoint Sets training module.
Even after clearing the Online Quiz of this UFDS module, do you think that you have really mastered this data structure?
Let us challenge you by asking you to solve two programming problems that somewhat requires the usage of this Union-Find Disjoint Sets data structure: UVa 01329 - Corporative Network and Kattis - control.
Beware that both problems are actual International Collegiate Programming Contest (ICPC) problems, i.e., they are "not trivial".
The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.
If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.
FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.
Notice that there is no 'undo' operation for Union-Find Disjoint Sets (UFDS) data structure. Once two initially disjoint sets were union-ed, it is not easy to split them back into the original two disjoint sets, especially when path compressions have flattened the combined tree.
Discussion: So what to do if we need this 'de-Union' or 'split' or 'cut' operation?
The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.
If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.
FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.
You have reached the last slide. Return to 'Exploration Mode' to start exploring!
Note that if you notice any bug in this visualization or if you want to request for a new visualization feature, do not hesitate to drop an email to the project leader: Dr Steven Halim via his email address: stevenhalim at gmail dot com.
Contoh-contoh
Inisialisasi
FindSet
IsSameSet
UnionSet