>

>
1x
go to beginning previous frame pause play next frame go to end

Hash Table is a data structure to map key to values (also called Table or Map Abstract Data Type/ADT). It uses a hash function to map large or even non-Integer keys into a small range of Integer indices (typically [0..hash_table_size-1]).


The probability of two distinct keys colliding into the same index is relatively high and each of this potential collision needs to be resolved to maintain data integrity.


There are several collision resolution strategies that will be highlighted in this visualization: Open Addressing (Linear Probing, Quadratic Probing, and Double Hashing) and Closed Addressing (Separate Chaining). Try clicking Search(7) for a sample animation of searching a specific value 7 in a randomly created Hash Table using Separate Chaining technique (duplicates are allowed).


Remarks: By default, we show e-Lecture Mode for first time (or non logged-in) visitor.
If you are an NUS student and a repeat visitor, please login.

🕑

Hashing adalah sebuah algoritma (lewat fungsi hash) yang memetakan set-set data besar dengan panjang variable, disebut kunci-kunci, tidak harus bilangan-bilangan bulat, ke set-set data bilangan bulat yang lebih kecil dengan panjang tertentu.


Sebuah Tabel Hash adalah struktur data yang menggunakan fungsi hash untuk memetakan secara efisien kunci-kunci ke nilai-nilai (ADT Tabel atau Map), untuk pencarian/pengambilan, pemasukkan, dan/atau penghapusan yang efisien.


Tabel Hash sering digunakan di berbagai perangkat lunak komputer, terutama untuk larik-larik asosiatif, indeks basis data, caches, dan sets.


Di Kuliah Maya ini, kita akan menyamping sebentar ke ADT Tabel, ide-ide dasar dari Hashing, diskusi dari Fungsi-fungsi Hash sebelum masuk ke detil-detil dari struktur data Tabel Hash itu sendiri.


Pro-tip 1: Since you are not logged-in, you may be a first time visitor (or not an NUS student) who are not aware of the following keyboard shortcuts to navigate this e-Lecture mode: [PageDown]/[PageUp] to go to the next/previous slide, respectively, (and if the drop-down box is highlighted, you can also use [→ or ↓/← or ↑] to do the same),and [Esc] to toggle between this e-Lecture mode and exploration mode.

🕑

Sebuah ADT Tabel harus mendukung setidaknya tiga operasi dibawah ini dengan seefisien mungkin:

  1. Cari(v) — tentukan apabila nilai v terdapat di dalam tabel atau tidak,
  2. Masukkan(v) — masukkan nilai v kedalam ADT,
  3. Hapus(v) — hapus nilai v dari ADT.

Tabel Hash adalah salah satu pilihan implementasi until ADT Tabel ini (pilihan yang lain adalah ini).


PS1: Untuk dua implementasi yang lebih lemah dari ADT Tabel, anda bisa meng-klik link berikut: array tidak beratur atau array teratur untuk membaca diskusi lebih dalam.


PS2: Dalam kelas langsung, anda mungkin mau membandingkan persyaratan-persyaratan dari ADT Table versus ADT List.


Pro-tip 2: We designed this visualization and this e-Lecture mode to look good on 1366x768 resolution or larger (typical modern laptop resolution in 2021). We recommend using Google Chrome to access VisuAlgo. Go to full screen mode (F11) to enjoy this setup. However, you can use zoom-in (Ctrl +) or zoom-out (Ctrl -) to calibrate this.

🕑

Ketika range dari kunci-kunci bilangan bulat adalah kecil, yaitu [0..M-1], kita dapat menggunakan larik (Boolean) A yang pada awalnya kosong dengan ukuran M dan mengimplementasikan operasi-operasi ADT Tabel secara langsung:

  1. Cari(v): Cek apakah A[v] adalah benar (terisi) atau salah (kosong),
  2. Masukkan(v): Set A[v] menjadi benar (terisi),
  3. Hapus(v): Set A[v] menjadi salah (kosong).

Itu saja, kita menggunakan kunci bilangan bulat kecil itu sendiri untuk menentukan alamatnya di larik A, maka namanya adalah Pengalamatan Langsung (Direct Addressing). Jelas bahwa semua operasi-operasi ADT Tabel utama adalah O(1).


Catatan: Ide ini juga digunakan ditempat lain, misalkan dalam Pengurutan dengan Menghitung (Counting Sort).


Pro-tip 3: Other than using the typical media UI at the bottom of the page, you can also control the animation playback using keyboard shortcuts (in Exploration Mode): Spacebar to play/pause/replay the animation, / to step the animation backwards/forwards, respectively, and -/+ to decrease/increase the animation speed, respectively.

🕑

In Singapore (as of Apr 2023), bus routes are numbered from [2..991].


Not all integers between [2..991] are currently used, e.g., there is no bus route 989 — Search(989) should return false. A new bus route x may be introduced, i.e., Insert(x) or an existing bus route y may be discontinued, i.e., Remove(y).


As the range of possible bus routes is small, to record the data whether a bus route number exists or not, we can use a DAT with a Boolean array of size 1000 (generally, it is useful to give a few extra buffer cells on top of the current largest bus number of 991).

🕑

Notice that we can always add satellite data instead of just using a Boolean array to record the existence of the keys.


For example, we can use an associative String array A instead to map a bus route number to its operator name, e.g.,

A[2] = "Go-Ahead Singapore",
A[10] = "SBS Transit",
A[183] = "Tower Transit Singapore",
A[188] = "SMRT Buses", etc.

Discussion: Can you think of a few other real-life DAT examples?

🕑

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.


If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.


FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.

🕑

Kunci-kuncinya harus (atau bisa dengan mudah dipetakan ke) nilai-nilai bilangan-bulat tidak-negatif. Sadari bahwa DAT dasar memiliki masalah dalam versi penuh dari contoh di dua slide sebelumnya karena sesungguhnya ada variasi-variasi dari nomor rute bus di Singapore, yaitu 96B, 151A, NR10, dsb.


Range dari kunci-kunci haruslah kecil. Penggunaan memori akan (luar biasa) besar jika kita memiliki range yang (luar biasa) besar.


Kunci-kuncinya harus padat, yaitu tidak banyak celah-celah dalam nilai-nilai kunci. DAT akan memiliki terlalu banyak sel-sel kosong kalau tidak.


Kita akan mengatasi batasan-batasan ini dengan hashing.

🕑

Dengan hashing, kita dapat:

  1. Memetakan (beberapa) kunci-kunci bukan-bilangan-bulat ke kunci-kunci bilangan bulat,
  2. Memetakan bilangan bulat besar ke bilangan bulat yang lebih kecil.
🕑

For example, we have N = 400 Singapore phone numbers (Singapore phone number has 8 digits, so there are up to 10^8 = 100M possible phone numbers in Singapore).


Instead of using a DAT and use a gigantic array up to size M = 100 Million, we can use the following simple hash function h(v) = v%997.


This way, we map 8 digits phone numbers 6675 2378 and 6874 4483 into up to 3 digits h(6675 2378) = 237 and h(6874 4483) = 336, respectively. Therefore, we only need to prepare an array of size M = 997 (997 is a prime) instead of M = 100 Million.

🕑

Dengan hashing, kita sekarang dapat mengimplementasikan operasi-operasi ADT Tabel berikut menggunakan larik bilangan bulat (daripada larik Boolean) sebagai berikut:

  1. Cari(v): Mengecek bila A[h(v)] != -1 (kita menggunakan -1 untuk sel yang kosong dengan asumsi v ≥ 0),
  2. Masukkan(v): Set A[h(v)] = v (kita hash v ke h(v) sehingga kita juga perlu menyimpan kunci v),
  3. Hapus(v): Set A[h(v)] = -1 — untuk dijelaskan lebih lanjut.
🕑

Jika kita memiliki kunci-kunci yang dipetakan ke data satelit dan kita mau menyimpan kunci-kunci aslinya juga, kita dapat mengimplementasikan Tabel Hash menggunakan larik pasangan (pair) (Bilangan bulat, tipe-data-satelit) sebagai berikut:

  1. Cari(v): Kembalikan A[h(v)], yang adalah pair (v, data-satelit), mungkin kosong,
  2. Masukkan(v, data-satelit): Set A[h(v)] = pair(v, data-satelit),
  3. Hapus(v): Set A[h(v)] = (pair kosong) — untuk dijelaskan lebih lanjut.

Tetapi, pada saat ini anda harusnya menyadari bahwa sesuatu tidak komplet...

🕑

Sebuah fungsi hash mungkin, dan sebenarnya sangat mungkin, memetakan kunci-kunci yang berbeda (bilangan bulat atau tidak) ke slot bilangan bulat yang sama, yaitu pemetaan banyak-ke-satu daripada pemetaan satu-ke-satu.


Contohnya, h(6675 2378) = 237 dari tiga slide-slide sebelumnya dan jika kita mau memasukkan nomor telepon lain yaitu 6675 4372, kita akan mendapatkan sebuah masalah karena h(6675 4372) = 237 juga.


Situasi ini disebut sebagai tabrakan (collision), yaitu dua kunci (atau lebih) memiliki nilai hash yang sama.

🕑

Paradoks Ulang Tahun (von Mises) menanyakan hal ini: 'Berapa banyak orang (jumlah kunci-kunci) harus berada di sebuah ruangan (Tabel Hash) dengan ukuran 365 kursi-kursi (sel-sel) sebelum kemungkinan bahwa beberapa orang memiliki hari ulang tahun yang sama (tabrakan (collision), dua kunci ter-hash ke sel yang sama), mengabaikan tahun-tahun kabisat (yaitu semua tahun memiliki 365 hari), menjadi > 50 persen (yaitu lebih mungkin terjadi daripada tidak)?'


Jawabannya, yang mungkin mengejutkan bagi sebagian dari kita, adalah Reveal.


Mari melakukan beberapa kalkulasi.

🕑

Biarlah Q(n) adalah kemungkinan ulang tahun unik untuk n orang didalam sebuah ruangan.
Q(n) = 365/365 × 364/365 × 363/365 × ... × (365-n+1)/365,
yaitu, ulang tahun dari orang pertama bisa hari apapun dari 365 hari, ulang tahun orang kedua bisa hari apapun dari 365 hari kecuali hari ulang tahun orang pertama, dan seterusnya.


Biarlah P(n) adalah kemungkinan ulang tahun yang sama (tabrakan (collision)) untuk n orang didalam sebuah ruangan.
P(n) = 1-Q(n).


Kita menghitung bahwa P(23) = 0.507 > 0.5 (50%).


Oleh karena itu, kita hanya memerlukan 23 orang (sedikit jumlah kunci-kunci) didalam sebuah ruangan (Tabel Hash) dengan ukuran 365 kursi-kursi (sel-sel) supaya (lebih dari) 50% kemungkinan tabrakan (collision) terjadi (ulang tahun dari dua orang yang berbeda diruangan tersebut adalah salah satu dari 365 hari/slot).

🕑

Isu 1: Kita telah melihat fungsi hash sederhana seperti h(v) = v%997 digunakan dalam contoh Nomor-nomor Telepon yang memetakan range besar dari kunci-kunci bilangan bulat ke range yang lebih kecil dari kunci-kunci bilangan bulat, tetapi bagaimana dengan kunci-kunci yang bukan bilangan bulat? Bagaimana caranya melakukan hashing dengan efisien untuk hal tersebut?


Isu 2: Kita telah melihat bahwa dengan hashing, atau pemetaan, range besar ke range yang lebih kecil, mungkin sekali akan ada tabrakan (collision). Bagaimana caranya mengatasi hal tersebut?

🕑

Bagaimana caranya untuk menciptakan sebuah fungsi hash yang baik dengan properti yang diinginkan berikut ini?

  1. Cepat untuk dihitung, yaitu dalam O(1),
  2. Menggunakan slot-slot/ukuran Table Hash M seminimum mungkin,
  3. Menyebarkan kunci-kunci ke alamat-alamat dasar yang berbeda seragam mungkin ∈ [0..M-1],
  4. Mengalami tabrakan sesedikit mungkin.
🕑

Misalkan kita mempunya tabel hash dengan ukuran M dimana kunci-kunci digunakan untuk mengidentifikasikan data satelit dan sebuah fungsi hash spesifik digunakan untuk menghitung nilai hash.


Sebuah nilai hash/kode hash dari kunci v dihitung dari kunci v dengan menggunakan sebuah fungsi hash untuk mendapatkan sebuah bilangan bulat dalam range 0 ke M-1. Nilai hash ini digunakan sebagai indeks/alamat dasar/rumah dari masukan Tabel Hash untuk data-satelit.

🕑

Using the Phone Numbers example, if we we define h(v) = floor(v/1 000 000),
i.e., we select the first two digits a phone number.

h(66 75 2378) = 66
h(68 74 4483) = 68

Discuss: What happen when you use that hash function? Hint: See this.

🕑

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.


If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.


FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.

🕑

Sebelum mendiskusikan kenyataan, mari mendiskusikan kasus ideal: fungsi-fungsi hash sempurna.


Sebuah fungsi hash sempurna adalah pemetaan satu-ke-satu antara kunci-kunci dan nilai-nilai hash, yaitu tidak ada tabrakan sama sekali. Hal ini memungkinkan jika semua kunci-kunci diketahui sebelumnya. Contohnya, sebuah pencarian compiler/interpreter untuk kata-kata kunci reserved. Tetapi, kasus-kasus seperti ini jarang.


Fungsi hash sempurna yang paling minim terpenuhi ketika ukuran tabel sama dengan jumlah kata-kata kunci yang disediakan. Kasus ini lebih jarang lagi.


Jika anda tertarik, anda bisa mengeksplorasi GNU gperf, sebuah generator fungsi hash sempurna yang tersedia secara gratis yang ditulis dalam bahasa C++ yang secara otomatis membuat fungsi-fungsi sempurna (sebuah program C++) dari daftar kata-kata kunci yang disuplai oleh pengguna.

🕑

Orang-orang telah mencoba berbagai cara untuk meng-hash range besar bilangan bulat ke range bilangan bulat yang lebih kecil secara seragam mungkin. Dalam Kuliah Maya ini, kita meloncat langsung ke salah satu versi yang terbaik dan yang paling populer: h(v) = v%M, yaitu petakan v ke Tabel Hash dengan ukuran M slot. Operasi (%) adalah operasi modulo yang memberikan kita sisa setelah pembagian. Ini tentu cepat, yaitu O(1) dengan asumsi bahwa v tidak lebih besar dari batasan tipe data Integer yang natural.


Ukuran Tabel Hash M di set sebagai bilangan prima yang cukup besar tidak dekat dengan pangkat 2, sekitar 2+ kali lebih besar dari jumlah ekspektasi kunci-kunci N yang akan pernah digunakan di dalam Tabel Hash. Dengan cara ini, maka load factor α = N/M < 0.5 — kita akan melihat nanti bahwa dengan memiliki load factor rendah, dengan demikian merelakan ruang-ruang kosong, sebenarnya membantu memperbaiki performa Tabel Hash.


Diskusi: Apa yang terjadi jika kita mengeset M sebagai pangkat dari 10 (desimal) atau pangkat dari 2 (biner)?

🕑

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.


If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.


FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.

🕑

People has also tried various ways to hash Strings into a small range of Integers as uniformly as possible. In this e-Lecture, we jump directly to one of the best and most popular version, shown below:

int hash_function(string v) { // assumption 1: v uses ['A'..'Z'] only
int sum = 0; // assumption 2: v is a short string
for (auto& c : v) // for each character c in v
sum = ((sum*26)%M + (c-'A'+1))%M; // M is table size
return sum;
}

Interactive (M = ∞), i.e., the modulo operation has no effect
v = , hash_string(v) = 0.


Discussion: In real life class, discuss the components of the hash function above, e.g., why loop through all characters?, will that be slower than O(1)?, why multiply with 26?, what if the string v uses more than just UPPERCASE chars?, etc.

🕑

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.


If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.


FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.

🕑

There are two major ideas: Open Addressing versus Closed Addressing method.


In Open Addressing, all hashed keys are located in a single array. The hash code of a key gives its base address. Collision is resolved by checking/probing multiple alternative addresses (hence the name open) in the table based on a certain rule.


In Closed Addressing, the Hash Table looks like an Adjacency List (a graph data structure). The hash code of a key gives its fixed/closed base address. Collision is resolved by appending the collided keys inside an auxiliary data structure (usually any form of List ADT) identified by the base address.

🕑

There are three Open Addressing (OA) collision resolution techniques discussed in this visualization: Linear Probing (LP), Quadratic Probing (QP), and Double Hashing (DH).


To switch between the three modes, please click on the respective header.


Let:
M = HT.length = the current hash table size,
base = (key%HT.length),
step = the current probing step,
secondary = smaller_prime - key%smaller_prime (to avoid zero — elaborated soon)

We will soon see that the probing sequences of the three modes are:
Linear Probing: i=(base+step*1) % M,
Quadratic Probing: i=(base+step*step) % M, and
Double Hashing: i=(base+step*secondary) % M.


All three OA techniques require that the load factor α = N/M < 1.0 (otherwise no more insertion is possible). If we can bound α to be a small constant (true if we know the expected largest N in our Hash Table application so that we can set up M accordingly, preferably < 0.5 for most OA variants), then all Search(v), Insert(v), and Remove(v) operations using Open Addressing will be O(1) — details omitted.

🕑

Separate Chaining (SC) collision resolution technique is simple. We use M copies of auxiliary data structures, usually Doubly Linked Lists. If two keys a and b both have the same hash value i, both will be appended to the (front/back) of Doubly Linked List i (in this visualization, we append to the back in O(1) with help of tail pointer). That's it, where the keys will be slotted in is completely dependent on the hash function itself, hence we also call Separate Chaining as Closed Addressing collision resolution technique.


If we use Separate Chaining, the load factor α = N/M is the average length of the M lists (unlike in Open Addressing, α can be "slightly over 1.0") and it will determine the performance of Search(v) as we may have to explore α elements on average. As Remove(v) also requires Search(v), its performance is similar as Search(v). Insert(v) is clearly O(1).


If we can bound α to be a small constant (true if we know the expected largest N in our Hash Table application so that we can set up M accordingly), then all Search(v), Insert(v), and Remove(v) operations using Separate Chaining will be O(1).

🕑

View the visualization of Hash Table above.


In this visualization, we allow the insertion of duplicate keys (i.e., a multiset). Since a multiset is more general than a set, simply just insert distinct integers in this visualization if you want to see how Hash Table works on distict integer keys only.


Due to limited screen space, we switch from default (1.0x) scale to 0.5x scale whenever you want to visualize Hash Table size M ∈ [46..90] for OA techniques. The limit is a bit lower, i.e., M ∈ [20..31] for SC technique.


The Hash Table is visualized horizontally like an array where index 0 is placed at the leftmost of the first row and index M-1 is placed at the rightmost of the last row but the details are different when we are visualizing Open Addressing (usually spans multiple rows) versus Separate Chaining (only the top row) collision resolution techniques.

🕑

There are three Open Addressing collision resolution techniques discussed in this visualization: Linear Probing (LP), Quadratic Probing (QP), and Double Hashing (DH).


For all three techniques, each Hash Table cell is displayed as a vertex with cell value of [0..99] displayed as the vertex label (in 0.5x scale, the vertex label is displayed on top of the smaller black dot). Without loss of generality, we do not show any satellite data in this visualization as we concentrate only on the arrangement of the keys. We reserve value -1 to indicate an 'EMPTY cell' (visualized as a blank vertex) and -2 to indicate a 'DELETED cell' (visualized as a vertex with abbreviated label "DEL"). The cell indices ranging from [0..M-1] are shown as red label below each vertex (rows of 15 indices in 1.0x scale or rows of 25 indices in 0.5x scale).

🕑

For Separate Chaining (SC) collision resolution technique, the first row contains the M "H" (Head) pointers of M Doubly Linked Lists.


Then, each Doubly Linked List i contains all keys that are hashed into i in arbitrary order (in 0.5x scale, the vertex label is displayed on top of the smaller black dot). Mathematically, all keys that can be expressed as i (mod M) — including all duplicates of i — are hashed into DLL i. Again, we do not store any satellite data in this visualization.

🕑

Dalam teknik resolusi tabrakan Linear Probing, kita menelusuri kedepan satu indeks setiap saat untuk slot kosong/terhapus berikutnya (kembali kedepan ketika kita telah mencapai slot terakhir) bilamana terjadi tabrakan.


Contohnya, mari asumsikan bahwa kita memulai dengan Tabel Hash kosong HT dengan ukuran tabel M = HT.length = 7 seperti yang ditunjukkan diatas yang menggunakan indeks 0 ke M-1 = 7-1 = 6. Sadri bahwa 7 adalah bilangan prima. Fungsi hash (primer) sederhana saja, h(v) = v%M.


Walk-through ini akan menunjukkan anda langkah-langkah yang diambil oleh operasi-operasi Masukkan(v), Cari(v), dan Hapus(v) ketika menggunakan Linear Probing sebagai teknik resolusi tabrakan.

🕑

Now click Insert([18,14,21]) — three individual insertions in one command.


Recap (to be shown after you click the button above).


Formally, we describe Linear Probing index i as i = (base+step*1) % M where base is the (primary) hash value of key v, i.e., h(v) and step is the Linear Probing step starting from 1.


Tips: To do a quick mental calculation of a (small) Integer V modulo M, we simply subtract V with the largest multiple of MV, e.g., 18%7 = 18-14 = 4, as 14 is the largest multiple of 7 that is ≤ 18.

🕑

Sekarang klik Insert([1,35]) (selain dari tiga nilai-nilai pertama yang sudah dimasukkan di slide sebelumnya).


Rekap (akan ditunjukkan setelah anda mengklik tombol diatas)

🕑

Sekarang kita mengilustrasikan operasi Cari(v) dimana kita menggunakan Linear Probing sebagai teknik resolusi tabrakan. Langkah-langkah yang diambil sangat mirip dengan operasi Masukkan(v), yaitu kita mulai dari nilai hash kunci (primer) dan mengecek jika kita telah menemukan v, kalau tidak kita maju satu indeks kedepan satu per satu (kembali ke depan jika perlu) dan mengecek ulang apakah kita telah menemukan v. Kita berhenti ketika kita menjumpai sel kosong yang berarti v tidak berada dalam Tabel Hash sama sekali (karena operasi Masukkan(v) sebelumnya harusnya sudah menaruh v disana kalau tidak).


Sekarang klik Search(35) — anda harus melihat urutan penyelidikan (probing) [0,1,2,3 (kunci 35 ditemukan)].


Sekarang klik Search(7) — [1,2,3,4, 5 (sel kosong, jadi kunci 8 tidak ditemukan di Tabel Hash)].

🕑

Sekarang mari diskusikan operasi Hapus(v).


Jika kita baru saja mengeset sel HT[i] = KOSONG langsung dimana i adalah indeks yang mengandung v (setelah probing linear jika diperlukan), apakah anda menyadari bahwa kita akan menyebabkan sebuah masalah? Kenapa?


Petunjuk: Ulas tiga slide-slide terakhir tentang bagaimana Masukkan(v) dan Cari(v) bekerja.

🕑

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.


If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.


FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.

🕑

Sekarang mari lihat Hapus(v) yang lengkap. Jika kita menemukan v pada indeks i (setelah Linear Probing jika diperlukan), kita harus mengeset HT[i] = TERHAPUS (disingkat sebagai DEL dalam visualisasi ini) dimana DEL adalah simbol spesial (secara umum anda harus hanya menggunakan simbol yang tidak dipakai di aplikasi anda) untuk mengindikasikan bahwa sel tersebut bisa di-lewati jika perlu oleh Cari(v) di masa mendatang, tetapi bisa ditimpa oleh Masukkan(w) di masa mendatang. Strategi ini desebut sebagai Penghapusan Malas (Lazy Deletion).


Sekarang klik Remove(21) — [0,1 (kunci 21 ditemukan dan kita set H[1] = DEL)].


Setelah itu, silahkan lanjutkan diskusi di slide berikuktnya.

🕑

Sekarang klik Search(35) — [0,1 (melewati sel yang TERHAPUS), 2,3 (menemukan kunci 35)].


Bayangkan apa yang akan terjadi jika kita salah mengeset H[1] = KOSONG.

🕑

Sekarang klik Insert(28) — anda harusnya melihat barisan probing [0,1 (menemukan sel dengan simbol DEL)], jadi sel ini sebenarnya bisa ditimpa dengan nilai baru tanpa mempengaruhi kebenaran dari Cari(v) di masa mendatang. Jadi, kita taruh 28 di indeks 1.

🕑

Although we can resolve collision with Linear Probing, it is not the most effective way.


We define a cluster to be a collection of consecutive occupied slots. A cluster that covers the base address of a key is called the primary cluster of the key.


Now notice that Linear Probing can create large primary clusters that will increase the running time of Search(v)/Insert(v)/Remove(v) operations beyond the advertised O(1).


See an example above with M = 31 and we have inserted 15 keys [0..14] so that they occupy cells [0..14] (α = 15/31 < 0.5). Now see how 'slow' Insert(31) (the 16th key) is.

🕑

The probe sequence of Linear Probing can be formally described as follows:

 h(v) // base address
(h(v) + 1*1) % M // 1st probing step if there is a collision
(h(v) + 2*1) % M // 2nd probing step if there is still a collision
(h(v) + 3*1) % M // 3rd probing step if there is still a collision
...
(h(v) + k*1) % M // k-th probing step, etc...

During Insert(v), if there is a collision but there is an empty (or DEL) slot remains in the Hash Table, we are sure to find it after at most M Linear Probing steps, i.e., in O(M). And when we do, the collision will be resolved, but the primary cluster of the key v is expanded as a result and future Hash Table operations will get slower too. Try the slow Search(31) on the same Hash Table as in the previous slide but with many DEL markers (suppose {4, 5, 8, 9, 10, 12, 14} have just been deleted).

🕑

In the previous slide (Primary Clustering, Part 1), we break the assumption that the hash function should uniformly distribute keys around [0..M-1]. In the next example, we will show that the problem of primary clustering can still happen even if the hash function distribute the keys into several relatively short primary clusters around [0..M-1].


On screen, you see M = 31 with 15 random integers between [0..99] inserted (there are several random but short primary clusters). If we then insert these next 4 keys {2, 9, 12, 1}, the first three keys will "plug" the three empty cells and accidentally annex (or combine) those neighboring (but previously disjointed) clusters into a (very) long primary cluster. So the next insertion of a key 1 that lands at (the beginning of) this long primary cluster will end up performing almost O(M) probing steps just to find an empty cell. Try Insert([2,9,12,1]).

🕑

Untuk mengurangi primary clustering, kita bisa memodifikasi urutan penyelidikan (probe) menjadi:

 h(v) // alamat dasar
(h(v) + 1*1) % M // langkah probing ke-1 jika terjadi tabrakan
(h(v) + 2*2) % M // langkah probing ke-2 jika masih terjadi tabrakan
(h(v) + 3*3) % M // langkah probing ke-3 jika masih terjadi tabrakan
...
(h(v) + k*k) % M // langkah probing ke-k, dsb...

Seperti itu, penyelidikannya (probe) meloncat secara kuadratik, kembali ke depan Tabel Hash seperlunya.


Sebuah kesalahan yang paling sering karena hal ini adalah Quadratic Probing tipe lain:
Melakukan h(v), (h(v)+1) % M, (h(v)+1+4) % M, (h(v)+1+4+9) % M, ...

🕑

Asumsikan bahwa kita telah memanggil Masukkan(18) dan Masukkan(10) ke Tabel Hash yang pada awalnya kosong dengan ukuran M = HT.length = 7. Karena 18%7 = 4 dan 10%7 = 3, 18 dan 3 tidak bertabrakan dan keduanya masing-masing berada di indeks 4 dan 3 seperti yang ditunjukkan diatas.


Sekarang, mari klik Insert(38).


Ulangan (akan ditunjukkan setelah anda mengklik tombol diatas).

🕑

Operasi-operasi Hapus(x) dan Cari(y) didefinisikan dengan mirip. Hanya saja kali ini kita menggunakan Quadratic Probing dan bukan Linear Probing.


Contohnya, asumsikan bahwa kita telah memanggil Hapus(18) setelah slide sebelumnya dan kita menandai HT[4] = TERHAPUS. Jika kita lalu memanggil Search(38), kita akan menggunakan urutan Quadratic Probing yang sama seperti slide sebelumnya, tetapi menembus HT[4] yang sudah di tandai sebagai TERHAPUS.

🕑

Sekilasi, Quadratic Probing yang meloncat +1, +4, +9, +16, ... secara kuadratik sepertinya bisa menyelesaikan isu primary clustering yang kita hadapi dengan Linear Probing sebelumnya, tetapi akah ini adalah teknik resolusi tabrakan yang sempurna?


Cobalah Insert([12,17]).


Apakah anda menyadari apa yang baru saja terjadi?

🕑

Kita bisa memasukkan 12 dengan mudah karena h(12) = 12%7 = 5 sebelumnya kosong (lihat diatas).


Tetapi kita akan memiliki masalah mayor dalam memasukkan kunci 17 bahkan ketika kita masih memiliki 3 slot kosong karena:
h(17) = 17%7 = 3 sudah terisi oleh kunci 10,
(3+1*1) % 7 = 4 sudah terisi oleh kunci 18,
(3+2*2) % 7 = 0 sudah terisi oleh kunci 38,
(3+3*3) % 7 = 5 sudah terisi oleh kunci 12,
(3+4*4) % 7 = 5 lagi sudah terisi oleh kunci 12,
(3+5*5) % 7 = 0 lagi sudah terisi oleh kunci 38,
(3+6*6) % 7 = 4 lagi sudah terisi oleh kunci 18,
(3+7*7) % 7 = 3 lagi sudah terisi oleh kunci 10,
Akan terjadi siklus selamanya jika kita melanjutkan Quadratic Probing ini...


Meskipun kita masih memiliki beberapa (3) sel-sel kosong, kita tidak bisa memasukkan nilai baru 17 ini kedalam Tabel Hash...

🕑

If α < 0.5 and M is a prime (> 3), then we can always find an empty slot using (this form of) Quadratic Probing. Recall: α is the load factor and M is the Hash Table size (HT.length).


If the two requirements above are satisfied, we can prove that the first M/2 Quadratic Probing indices, including the base address h(v) are all distinct and unique.


But there is no such guarantee beyond that. Hence if we want to use Quadratic Probing, we need to ensure that α < 0.5 (not enforced in this visualization but we do break the loop after M steps to prevent infinite loop).

🕑

Kita akan menggunakan pembuktian dengan kontradiksi. Kita pertama berasumsi bahwa dua langkah Quadratic Probing:
x dan y, x != y (misalkan x < y), bisa menghasilkan alamat yang sama modulo M.

h(v) + x*x = h(v) + y*y (mod M)
x*x = y*y (mod M) // hapus h(v) dari kedua sisi
x*x - y*y = 0 (mod M) // pindahkan y*y ke sisi kiri
(x-y)*(x+y) = 0 (mod M) // atur ulang formula

Sekarang, antara (x-y) atau (x+y) harus sama dengan nol.
Karena asumsi kita bilang bahwa x != y, maka (x-y) tidak bisa 0.
Karena 0 ≤ x < y ≤ (M/2) dan M adalah bilangan prima > 3 (sebuah bilangan bulat ganjil),
maka (x+y) juga tidak mungkin bisa 0 modulo M.


Kontradiksi!


Jadi M/2 langkah-langkah pertama dari Quadratic Probing tidak bisa menghasilkan alamat yang sama modulo M

(jika kita mengeset M sebagai bilangan prima lebih besar dari 3).


Diskusi: Bisakah kita membuat Quadratic Probing menggunakan ~50% sel-sel tabel yang lainnya?

🕑

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.


If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.


FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.

🕑

In Quadratic Probing, clusters are formed along the path of probing, instead of around the base address like in Linear Probing. These clusters are called Secondary Clusters and it is 'less visible' compared to the Primary Clusters that plagued the Linear Probing.


Secondary clusters are formed as a result of using the same pattern in probing by colliding keys, i.e., if two distinct keys have the same base address, their Quadratic Probing sequences are going to be the same.


To illustrate this, see the screen with M = 31. We have populated this Hash Table with only 10 keys (so load factor α = 10/31 ≤ 0.5) and the Hash Table looks 'sparse enough' (no visibly big primary cluster). However, if we then insert Insert(62,93), despite the fact that there are many (31-10 = 21) empty cells and 62 != 93 (different keys that ends up hashed into index 0), we end up doing 10 probing steps along this 'less visible' secondary cluster (notice that both {62, 93} follow similar Quadratic Probing sequences).


Secondary clustering in Quadratic Probing is not as bad as primary clustering in Linear Probing as a good hash function should theoretically disperse the keys into different base addresses ∈ [0..M-1] in the first place.

🕑

Untuk mengurangi clustering tipe primary dan secondary, kita dapat memodifikasi urutan probe ke:

 h(v) // alamat dasar
(h(v) + 1*h2(v)) % M // langkah probing ke-1 jika terjadi tabrakan
(h(v) + 2*h2(v)) % M // langkah probing ke-2 jika masih terjadi tabrakan
(h(v) + 3*h2(v)) % M // langkah probing ke-3 jika masih terjadi tabrakan
...
(h(v) + k*h2(v)) % M // langkah probing ke-k, dsb...

Seperti itu, probe nya meloncat sesuai nilai dari fungsi hash kedua h2(v), wrapping around Table Hash seperlunya.

🕑

Jika h2(v) = 1, maka Double Hashing bekerja sama persis seperti Linear Probing.
Jadi secara umum kita mau h2(v) > 1 untuk menghindari primary clustering.


Jika h2(v) = 0, maka Double Hashing tidak bekerja karena alasan yang sangat jelas karena langkah penyelidikan (probing) apapun dikalikan dengan 0 tetaplah 0, yaitu kita tetap di alamat dasar selamanya pada setiap tabrakan. Kita perlu menghindari hal ini.


Biasanya (untuk kunci-kunci bilangan bulat), h2(v) = M' - v%M' dimana M' adalah bilangan prima yang lebih kecil dari M.
Ini membuat h2(v) ∈ [1..M'], yang adalah cukup beragam untuk menghindari secondary clustering.


Penggunaan fungsi hash sekunder membuat Double Hashing secara teori susah untuk mengalami isu clustering primary ataupun secondary.

🕑

Klik Insert([35,42]) untuk memasukkan 35 dan lalu 42 ke Table Hash saat ini diatas.


Rekap (akan ditunjukkan setelah anda mengklik tombol diatas).

🕑

Operasi-operasi Hapus(x) dan Cari(y) didefinisikan dengan mirip. Hanya saja kali ini kita menggunakan Double Hashing dan bukan Linear Probing atau Quadratic Probing.


Contohnya, asumsikan bahwa kita telah memanggil Hapus(17) setelah slide sebelumnya dan kita menandai HT[3] = TERHAPUS. Jika kita lalu memanggil Search(35), kita akan menggunakan urutan Double Hashing yang sama seperti slide sebelumnya, tetapi menembus HT[3] yang sudah ditandai sebagai TERHAPUS.

🕑

In summary, a good Open Addressing collision resolution technique needs to:

  1. Always find an empty slot if it exists,
  2. Minimize clustering (of any kind),
  3. Give different probe sequences when 2 different keys collide,
  4. Fast, O(1).

Now, let's see the same test case that plagues Quadratic Probing earlier. Now try Insert(62,93) again. Although h(62) = h(93) = 0 and their collide with 31 that already occupy index 0, their probing steps are not the same: h2(62) = 29-62%29 = 25 is not the same as h2(93) = 29-93%29 = 23.


Discussion: Double Hashing seems to fit the bill. But... Is Double Hashing strategy flexible enough to be used as the default library implementation of a Hash Table? Let's see...

🕑

Try Insert([9,16,23,30,37,44]) to see how Insert(v) operation works if we use Separate Chaining as collision resolution technique. On such random insertions, the performance is good and each insertion is clearly O(1).


However if we try Insert([68,90]), notice that all Integers {68,90} are 2 (modulo 11) so all of them will be appended into the (back of) Doubly Linked List 2. We will have a long chain in that list. Note that due to the screen limitation, we limit the length of each Doubly Linked List to be at maximum 6.

🕑

Try Search(35) to see that Search(v) can be made to run in O(1+α).


Try Remove(35) to see that Remove(v) can be made to run in O(1+α) too.


If α is large, Separate Chaining performance is not really O(1). However, if we roughly know the potential maximum number of keys N that our application will ever use, then we can set table size M accordingly such that α = N/M is a very low positive (floating-point) number, thereby making Separate Chaining performances to be expected O(1).

🕑
Diskusi: Setelah semua penjelasan-penjelasan ini, mana dari kedua teknik collision resolution yang lebih baik?
🕑

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.


If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.


FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.

🕑

Anda telah mencapai akhir dari materi-materi dasar dari struktur data Table Hash ini dan kami mendorong anda untuk mengeksplorasi lebih jauh di Mode Eksplorasi.


Tetapi, kami masih mempunyai beberapa tantangan-tantangan Table Hash untuk anda yang diuraikan di bagian ini.

🕑

Performa dari Tabel Hash menurun ketika load factor α menjadi lebih tinggi. Untuk teknik resolusi tabrakan Quadratic Probing (standar), pemasukkan bisa gagal jika Tabel Hash memiliki α > 0.5.


JIka itu terjadi, kita bisa melakukan hash ulang. Kita buat Tabel Hash lain sekitar dua kali lebih besar dengan fungsi hash yang baru. Kita lalui semuai kunci-kunci di Tabel Hash asli, hitung ulang nilai-nilai hash baru, dan memasukkan ulang kunci-kunci (dan dengan data satelitnya) ke Tabel Hash yang baru dan lebih besar, sebelum pada akhirnya kita menghapus Tabel Hash yang lama dan lebih kecil.


Sebuah aturan praktis adalah untuk melakukan hash ulang ketika α ≥ 0.5 jika kita menggunakan Pengalamatan Terbuka (Open Addressing) dan ketika α > konstanta kecil (dekat dengan 1.0, sesuai kebutuhan) jika kita menggunakan Separate Chaining.


Jika kita mengetahui nilai maksimum dari total kunci-kunci yang mungkin dipakai, kita bisa selalu mempengaruhi α menjadi angka kecil.

🕑

Tetapi, jika anda perlu mengimplementasikan sebuah Tabel Hash dalam C++, Python, atau Java dan kunci-kunci anda adalah Integer atau Strings, anda bisa menggunakan C++ STL, perpustakaan standar Python, atau Java API yang sudah built-in. Mereka sudah memiliki implementasi built-in yang baik dari fungsi-fungsi hash default untuk Integer atau String.


Lihat C++ STL unordered_map, unordered_set, Python dictsetatau Java HashMap, HashSet.


Untuk C++, catat bahwa implementasi-implementasi std::multimap/std::multiset juga tersedia dimana kunci-kunci duplikat diperbolehkan.


Untuk OCaml, kita bisa menggunakan Hashtbl.


Tetapi, inilah implementasi Separate Chaining sederhana versi kami: HashTableDemo.cpp | py | java.

🕑

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various flipped classrooms in NUS.


If you are really a CS lecturer (or an IT teacher) (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (show your University staff profile/relevant proof to Steven) for Steven to manually activate this CS lecturer-only feature for you.


FAQ: This feature will NOT be given to anyone else who is not a CS lecturer.

🕑

Tabel Hash adalah struktur data yang sangat baik untuk mengimplementasikan ADT Tabel jika kunci-kunci (Integer atau String) hanya perlu dipetakan ke data-satelit, dengan performa O(1) untuk operasi-operasi Cari(v), Masukkan(v), dan Hapus(v) jika Tabel Hash disetup dengan benar.


Tetapi, jika kita perlu melakukan lebih banyak hal dengan kunci-kunci, kita mungkin perlu menggunakan struktur data alternatif.

🕑

Untuk beberapa pertanyaan yang lebih menarik tentang struktur data ini, silahkan latihan pada modul latihan Table Hash (tidak perlu login).


Tetapi untuk pengguna yang teregistrasi, anda harus login dan pergi ke Halaman Latihan Umum untuk secara resmi menyelesaikan modul ini dan prestasi tersebut akan disimpan dalam akun pengguna anda.

🕑

Cobalah selesaikan beberapa masalah-masalah pemrograman dasar yang agak membutuhkan penggunaan Tabel Hash (terutama jika ukuran masukkan jauh lebih besar):

  1. Kattis - cd (masukannya sudah terurut jadi solusi alternatif yang bukan menggunakan Tabel Hash ada; jika masukannya tidak terurut, masalah irisan himpunan ini baik diselesaikan dengan bantuan sebuah Tabel Hash),
  2. Kattis - oddmanout (kita bisa memetakan kode-kode undangan yang besar ke range bilangan bulat yang lebih kecil; ini adalah latihan untuk meng-hash bilangan bulat (dengan range besar)),
  3. Kattis - whatdoesthefoxsay (kita menaruk suara-suara yang bukan rubah (fox) ke set yang tidak terurut; ini adalah latihan untuk meng-hash string).

You have reached the last slide. Return to 'Exploration Mode' to start exploring!

Note that if you notice any bug in this visualization or if you want to request for a new visualization feature, do not hesitate to drop an email to the project leader: Dr Steven Halim via his email address: stevenhalim at gmail dot com.

🕑

Buat(M, N)

Masukkan(v)

Hapus(v)

>

New HT of size

M =

and

N =

random integers (α = N/M = 0.5)

Lakukan

v =

Lakukan

v =

Lakukan

Tentang Tim Syarat Guna Kebijakan Privasi

Tentang

Initially conceived in 2011 by Associate Professor Steven Halim, VisuAlgo aimed to facilitate a deeper understanding of data structures and algorithms for his students by providing a self-paced, interactive learning platform.

Featuring numerous advanced algorithms discussed in Dr. Steven Halim's book, 'Competitive Programming' — co-authored with Dr. Felix Halim and Dr. Suhendry Effendy — VisuAlgo remains the exclusive platform for visualizing and animating several of these complex algorithms even after a decade.

While primarily designed for National University of Singapore (NUS) students enrolled in various data structure and algorithm courses (e.g., CS1010/equivalent, CS2040/equivalent (including IT5003), CS3230, CS3233, and CS4234), VisuAlgo also serves as a valuable resource for inquisitive minds worldwide, promoting online learning.

Initially, VisuAlgo was not designed for small touch screens like smartphones, as intricate algorithm visualizations required substantial pixel space and click-and-drag interactions. For an optimal user experience, a minimum screen resolution of 1366x768 is recommended. However, since April 2022, a mobile (lite) version of VisuAlgo has been made available, making it possible to use a subset of VisuAlgo features on smartphone screens.

VisuAlgo remains a work in progress, with the ongoing development of more complex visualizations. At present, the platform features 24 visualization modules.

Equipped with a built-in question generator and answer verifier, VisuAlgo's "online quiz system" enables students to test their knowledge of basic data structures and algorithms. Questions are randomly generated based on specific rules, and students' answers are automatically graded upon submission to our grading server. As more CS instructors adopt this online quiz system worldwide, it could effectively eliminate manual basic data structure and algorithm questions from standard Computer Science exams in many universities. By assigning a small (but non-zero) weight to passing the online quiz, CS instructors can significantly enhance their students' mastery of these basic concepts, as they have access to an almost unlimited number of practice questions that can be instantly verified before taking the online quiz. Each VisuAlgo visualization module now includes its own online quiz component.

VisuAlgo has been translated into three primary languages: English, Chinese, and Indonesian. Additionally, we have authored public notes about VisuAlgo in various languages, including Indonesian, Korean, Vietnamese, and Thai:

id, kr, vn, th.

Tim

Pemimpin & Penasihat Proyek (Jul 2011-sekarang)
Associate Professor Steven Halim, School of Computing (SoC), National University of Singapore (NUS)
Dr Felix Halim, Senior Software Engineer, Google (Mountain View)

Murid-Murid S1 Peniliti 1
CDTL TEG 1: Jul 2011-Apr 2012: Koh Zi Chun, Victor Loh Bo Huai

Murid-Murid Proyek Tahun Terakhir/UROP 1
Jul 2012-Dec 2013: Phan Thi Quynh Trang, Peter Phandi, Albert Millardo Tjindradinata, Nguyen Hoang Duy
Jun 2013-Apr 2014 Rose Marie Tan Zhao Yun, Ivan Reinaldo

Murid-Murid S1 Peniliti 2
CDTL TEG 2: May 2014-Jul 2014: Jonathan Irvin Gunawan, Nathan Azaria, Ian Leow Tze Wei, Nguyen Viet Dung, Nguyen Khac Tung, Steven Kester Yuwono, Cao Shengze, Mohan Jishnu

Murid-Murid Proyek Tahun Terakhir/UROP 2
Jun 2014-Apr 2015: Erin Teo Yi Ling, Wang Zi
Jun 2016-Dec 2017: Truong Ngoc Khanh, John Kevin Tjahjadi, Gabriella Michelle, Muhammad Rais Fathin Mudzakir
Aug 2021-Apr 2023: Liu Guangyuan, Manas Vegi, Sha Long, Vuong Hoang Long, Ting Xiao, Lim Dewen Aloysius

Murid-Murid S1 Peniliti 3
Optiver: Aug 2023-Oct 2023: Bui Hong Duc, Oleh Naver, Tay Ngan Lin

Murid-Murid Proyek Tahun Terakhir/UROP 3
Aug 2023-Apr 2024: Xiong Jingya, Radian Krisno, Ng Wee Han

List of translators who have contributed ≥ 100 translations can be found at statistics page.

Ucapan Terima Kasih
NUS CDTL gave Teaching Enhancement Grant to kickstart this project.

For Academic Year 2023/24, a generous donation from Optiver will be used to further develop VisuAlgo.

Syarat Guna

VisuAlgo is generously offered at no cost to the global Computer Science community. If you appreciate VisuAlgo, we kindly request that you spread the word about its existence to fellow Computer Science students and instructors. You can share VisuAlgo through social media platforms (e.g., Facebook, YouTube, Instagram, TikTok, Twitter, etc), course webpages, blog reviews, emails, and more.

Data Structures and Algorithms (DSA) students and instructors are welcome to use this website directly for their classes. If you capture screenshots or videos from this site, feel free to use them elsewhere, provided that you cite the URL of this website (https://visualgo.net) and/or the list of publications below as references. However, please refrain from downloading VisuAlgo's client-side files and hosting them on your website, as this constitutes plagiarism. At this time, we do not permit others to fork this project or create VisuAlgo variants. Personal use of an offline copy of the client-side VisuAlgo is acceptable.

Please note that VisuAlgo's online quiz component has a substantial server-side element, and it is not easy to save server-side scripts and databases locally. Currently, the general public can access the online quiz system only through the 'training mode.' The 'test mode' offers a more controlled environment for using randomly generated questions and automatic verification in real examinations at NUS.

List of Publications

This work has been presented at the CLI Workshop at the ICPC World Finals 2012 (Poland, Warsaw) and at the IOI Conference at IOI 2012 (Sirmione-Montichiari, Italy). You can click this link to read our 2012 paper about this system (it was not yet called VisuAlgo back in 2012) and this link for the short update in 2015 (to link VisuAlgo name with the previous project).

Bug Reports or Request for New Features

VisuAlgo is not a finished project. Associate Professor Steven Halim is still actively improving VisuAlgo. If you are using VisuAlgo and spot a bug in any of our visualization page/online quiz tool or if you want to request for new features, please contact Associate Professor Steven Halim. His contact is the concatenation of his name and add gmail dot com.

Kebijakan Privasi

Version 1.2 (Updated Fri, 18 Aug 2023).

Since Fri, 18 Aug 2023, we no longer use Google Analytics. Thus, all cookies that we use now are solely for the operations of this website. The annoying cookie-consent popup is now turned off even for first-time visitors.

Since Fri, 07 Jun 2023, thanks to a generous donation by Optiver, anyone in the world can self-create a VisuAlgo account to store a few customization settings (e.g., layout mode, default language, playback speed, etc).

Additionally, for NUS students, by using a VisuAlgo account (a tuple of NUS official email address, student name as in the class roster, and a password that is encrypted on the server side — no other personal data is stored), you are giving a consent for your course lecturer to keep track of your e-lecture slides reading and online quiz training progresses that is needed to run the course smoothly. Your VisuAlgo account will also be needed for taking NUS official VisuAlgo Online Quizzes and thus passing your account credentials to another person to do the Online Quiz on your behalf constitutes an academic offense. Your user account will be purged after the conclusion of the course unless you choose to keep your account (OPT-IN). Access to the full VisuAlgo database (with encrypted passwords) is limited to Prof Halim himself.

For other CS lecturers worldwide who have written to Steven, a VisuAlgo account (your (non-NUS) email address, you can use any display name, and encrypted password) is needed to distinguish your online credential versus the rest of the world. Your account will have CS lecturer specific features, namely the ability to see the hidden slides that contain (interesting) answers to the questions presented in the preceding slides before the hidden slides. You can also access Hard setting of the VisuAlgo Online Quizzes. You can freely use the material to enhance your data structures and algorithm classes. Note that there can be other CS lecturer specific features in the future.

For anyone with VisuAlgo account, you can remove your own account by yourself should you wish to no longer be associated with VisuAlgo tool.