7    VisuAlgo.net / /hashtable Login LP QP DH SC
示例模式 ▿

>

>
减速
加速
go to beginning previous frame pause play next frame go to end
哈希表是一种将键映射到值的数据结构。它用哈希方程来将键映射到小范围的指数(一般为[0..哈希表大小-1])。 两个键冲突为 同样指数的几率相对较高并且每次潜在的冲突需要被解决才能维持数据完整。在这个可视化中有一些解决冲突的策略会被高亮: 开放寻址法(线性探测,二次探测 和 双倍散列)以及 分离连接法(即将上线)。

Remarks: By default, we show e-Lecture Mode for first time (or non logged-in) visitor.
Please login if you are a repeated visitor or register for an (optional) free account first.

X Esc
下一个 PgDn

散列是一种算法(通过散列函数),将大型可变长度数据集(称为键,不一定是整数)映射为固定长度的较小整数数据集。
哈希表是一种数据结构,它使用哈希函数有效地将键映射到值(表或地图ADT),以便进行高效的搜索/检索,插入和/或删除。
散列表广泛应用于多种计算机软件中,特别是关联数组,数据库索引,缓存和集合。
在这个电子讲座中,我们将深入讨论表ADT,哈希表的基本概念,讨论哈希函数,然后再讨论哈希表数据结构本身的细节。


Pro-tip: Since you are not logged-in, you may be a first time visitor who are not aware of the following keyboard shortcuts to navigate this e-Lecture mode: [PageDown] to advance to the next slide, [PageUp] to go back to the previous slide, [Esc] to toggle between this e-Lecture mode and exploration mode.

X Esc
上一个 PgUp
下一个 PgDn

A Table ADT must support at least the following three operations as efficient as possible:

  1. Search(v) — determine if v exists in the ADT or not,
  2. Insert(v) — insert v into the ADT,
  3. Remove(v) — remove v from the ADT.

Hash Table is one possible good implementation for this Table ADT (the other one is this).


PS1: For two weaker implementations of Table ADT, you can click the respective link: unsorted array or a sorted array to read the detailed discussions.


PS2: In live class, you may want to compare the requirements of Table ADT vs List ADT.


Another pro-tip: We designed this visualization and this e-Lecture mode to look good on 1366x768 resolution or larger (typical modern laptop resolution in 2017). We recommend using Google Chrome to access VisuAlgo. Go to full screen mode (F11) to enjoy this setup. However, you can use zoom-in (Ctrl +) or zoom-out (Ctrl -) to calibrate this.

X Esc
上一个 PgUp
下一个 PgDn

When the range of the Integer keys is small, e.g. [0..M-1], we can use an initially empty (Boolean) array A of size M and implement the following Table ADT operations directly:

  1. Search(v): Check if A[v] is true (filled) or false (empty),
  2. Insert(v): Set A[v] to be true (filled),
  3. Remove(v): Set A[v] to be false (empty).

That's it, we use the small Integer key itself to determine the address in array A, hence the name Direct Addressing. It is clear that all three major Table ADT operations are O(1).


PS: This idea is also used elsewhere, e.g., in Counting Sort.

X Esc
上一个 PgUp
下一个 PgDn

In Singapore (as of Sep 2021), bus routes are numbered from [2..991].


Not all integers between [2..991] are currently used, e.g., there is no bus route 989 — Search(989) should return false. A new bus route x may be introduced, i.e., Insert(x) or an existing bus route y may be discontinued, i.e., Remove(y).


As the range of possible bus routes is small, to record the data whether a bus route number exists or not, we can use a DAT with a Boolean array of size 1000.


Discussion: In real life class, we may discuss on why we use 1000 instead of 991 (or 992).

X Esc
上一个 PgUp
下一个 PgDn

Notice that we can always add satellite data instead of just using a Boolean array to record the existence of the keys.


For example, we can use an associative String array A instead to map a bus route number to its operator name, e.g.,

A[2] = "Go-Ahead Singapore",
A[10] = "SBS Transit",
A[183] = "Tower Transit Singapore",
A[188] = "SMRT Buses", etc.

Discussion: Can you think of a few other real-life DAT examples?

X Esc
上一个 PgUp
下一个 PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
上一个 PgUp
下一个 PgDn

The keys must be (or can be easily mapped to) non-negative Integer values. Note that basic DAT has problem in the full version of the example in the previous few slides as there are actually variations of bus route numbers in Singapore, e.g., 96B, 151A, NR10, etc.


The range of keys must be small.
The memory usage will be (insanely) large if we have (insanely) large range.


The keys must be dense, i.e., not many gaps in the key values.
DAT will contain too many empty (and wasted) cells otherwise.


We will overcome these restrictions with hashing.

X Esc
上一个 PgUp
下一个 PgDn

Using hashing, we can:

  1. Map (some) non-Integer keys (e.g., Strings) to Integers keys,
  2. Map large Integers to smaller Integers,
X Esc
上一个 PgUp
下一个 PgDn

For example, we have N = 400 Singapore phone numbers (Singapore phone number has 8 digits, so there are up to 10^8 = 100M possible phone numbers in Singapore).


Instead of using a DAT and use a gigantic array up to size M = 100 Million, we can use the following simple hash function h(v) = v%997.


This way, we map 8 digits phone numbers 6675 2378 and 6874 4483 into up to 3 digits h(6675 2378) = 237 and h(6874 4483) = 336, respectively. Therefore, we only need to prepare array of size M = 997 (or just simplify to 1000) instead of M = 100 Million.

X Esc
上一个 PgUp
下一个 PgDn
使用散列,我们现在可以使用整数数组(而不是布尔数组)执行下面的表ADT操作,如下所示:
  1. 搜索(v):检查A [h(v)]!= -1(假设 v≥0,我们对空单元使用-1)
  2. 插入(v):设置A [h(v)] = v(我们把 v 散列到h(v)中,所以我们需要以某种方式记录键 v),
  3. 删除(v):设置A [h(v)] = -1 --- 将进一步阐述。
X Esc
上一个 PgUp
下一个 PgDn

如果我们有映射到卫星数据的键,并且我们也想记录原始键,我们可以使用如下一对(整数,卫星数据类型)数组实现哈希表:

  1. 搜索(v):返回A [h(v)],它是一对v,卫星数据),可能是空的,
  2. 插入(v,卫星数据):设置A [h(v)] =对(v,卫星数据)
  3. 删除(v):设置A [h(v)] =(空对) - 将进一步详细阐述。

但是,现在你应该注意到有些地方是不完整的......

X Esc
上一个 PgUp
下一个 PgDn

A hash function may, and quite likely, map different keys (Integer or not) into the same Integer slot, i.e., a many-to-one mapping instead of one-to-one mapping.


For example, h(6675 2378) = 237 from three slides earlier and if we want to insert another phone number 6675 4372, we will have a problem as h(6675 4372) = 237 too.


This situation is called a collision, i.e., two (or more) keys have the same hash value.

X Esc
上一个 PgUp
下一个 PgDn

The Birthday (von Mises) Paradox asks: 'How many people (number of keys) must be in a room (Hash Table) of size 365 seats (cells) before the probability that some person share a birthday (collision, two keys are hashed to the same cell), ignoring the leap years (i.e., all years have 365 days), becomes > 50 percent (i.e., more likely than not)?'


The answer, which maybe surprising for some of us, is Reveal.


Let's do some calculation.

X Esc
上一个 PgUp
下一个 PgDn

Let Q(n) be the probability of unique birthday for n people in a room.
Q(n) = 365/365 × 364/365 × 363/365 × ... × (365-n+1)/365,
i.e., the first person's birthday can be any of the 365 days, the second person's birthday can be any of the 365 days except the first person's birthday, and so on.


Let P(n) be the probability of same birthday (collision) for n people in a room.
P(n) = 1-Q(n).


We compute that P(23) = 0.507 > 0.5 (50%).


Thus, we only need 23 people (a small amount of keys) in the room (Hash Table) of size 365 seats (cells) for a (more than) 50% chance collision to happen (the birthday of two different people in that room is one of 365 days/slots).

X Esc
上一个 PgUp
下一个 PgDn

问题1:我们已经看到了一个简单的散列函数,如电话号码示例中使用的h(v)= v%997,它将大范围的整数键映射到整数键的较小范围内,但非整数键的情况如何? 如何有效地做这样的散列?
问题2:我们已经看到,通过将大范围散列或映射到更小范围,很可能会发生碰撞。 如何处理它们?

X Esc
上一个 PgUp
下一个 PgDn

How to create a good hash function with these desirable properties?

  1. Fast to compute, i.e., in O(1),
  2. Uses as minimum slots/Hash Table size M as possible,
  3. Scatter the keys into different base addresses as uniformly as possible ∈ [0..M-1],
  4. Experience as minimum collisions as possible.
X Esc
上一个 PgUp
下一个 PgDn
假设我们有一个大小为 M 的散列表,其中键用于标识卫星数据,并且使用特定的散列函数来计算散列值。
使用散列函数从键v计算键v的散列值/散列码以获得范围从0到M-1的整数。 该散列值用作卫星数据的散列表条目的基址/归位索引/地址。
X Esc
上一个 PgUp
下一个 PgDn


X Esc
上一个 PgUp
下一个 PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
上一个 PgUp
下一个 PgDn

Before discussing the reality, let's discuss the ideal case: perfect hash functions.


A perfect hash function is a one-to-one mapping between keys and hash values, i.e., no collision at all. It is possible if all keys are known beforehand. For example, a compiler/interpreter search for reserved keywords. However, such cases are rare.


A minimal perfect hash function is achieved when the table size is the same as the number of keywords supplied. This case is even rarer.


If you are interested, you can explore GNU gperf, a freely available perfect hash function generator written in C++ that automatically constructs perfect functions (a C++ program) from a user supplied list of keywords.

X Esc
上一个 PgUp
下一个 PgDn

People has tried various ways to hash a large range of Integers into a smaller range of Integers as uniformly as possible. In this e-Lecture, we jump directly to one of the best and most popular version: h(v) = v%M, i.e., map v into Hash Table of size M slots. The (%) is a modulo operator that gives the remainder after division. This is clearly fast, i.e., O(1) assuming that v does not exceed the natural Integer data type limit.


The Hash Table size M is set to be a reasonably large prime not near a power of 2, about 2+ times larger than the expected number of keys N that will ever be used in the Hash Table. This way, the load factor α = N/M < 0.5 — we shall see later that having low load factor, thereby sacrificing empty spaces, help improving Hash Table performance.


Discuss: What if we set M to be a power of 10 (decimal) or power of 2 (binary)?

X Esc
上一个 PgUp
下一个 PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
上一个 PgUp
下一个 PgDn

People has also tried various ways to hash Strings into a small range of Integers as uniformly as possible. In this e-Lecture, we jump directly to one of the best and most popular version, shown below:

int hash_function(string v) { // assumption 1: v uses ['A'..'Z'] only
int sum = 0; // assumption 2: v is a short string
for (auto& c : v) // for each character c in v
sum = ((sum*26)%M + (c-'A'+1))%M; // M is table size
return sum;
}

Discussion: In real life class, discuss the components of the hash function above, e.g., why loop through all characters?, will that be slower than O(1)?, why multiply with 26?, what if the string v uses more than just UPPERCASE chars?, etc.

X Esc
上一个 PgUp
下一个 PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
上一个 PgUp
下一个 PgDn
选择分别的页眉以在不同使用模式之间切换:分离连接法或者开放寻址法(线性探测,二次探测 和 双倍散列)
设:
HT_size = 现在哈希表的大小
基础 = (值%HT_size) 
步骤 =  现在探测的步骤 
第二级 = 1+值%(HT_size-2) ( 防止为0 ), 则: 

线性探测: i=(基础+步骤*1)%HT_size,
二次探测: i=(基础+步骤*步骤)%HT_size, 以及
双倍散列: i=(基础+步骤*第二级)%HT_size

X Esc
上一个 PgUp
下一个 PgDn
在此可视化中讨论了三种开放寻址(OA)冲突解决技术:线性探测(LP),二次探测(QP)和双散列(DH)。
要在三种模式之间切换,请点击相应的标题。
假设:M = HT.length =当前散列表大小,base =(key%HT.length),step =当前探测步骤,secondary = smaller_prime - key%smaller_prime(避免零 - 很快就会阐述)我们很快就会看到 三种模式的探测序列为:线性探测:i =(基础+步骤* 1)%M,二次探测:i =(基础+步骤*步骤)%M和双散列:i =(基础+步骤 *次要)%M.
X Esc
上一个 PgUp
下一个 PgDn

Separate Chaining (SC) collision resolution technique is simple. We use M copies of auxiliary data structures, usually Doubly Linked Lists. If two keys a and b both have the same hash value i, both will be appended to the (front/back) of Doubly Linked List i (in this visualization, we append to the back in O(1) with help of tail pointer). That's it, where the keys will be slotted in is completely dependent on the hash function itself, hence we also call Separate Chaining as Closed Addressing collision resolution technique.


If we use Separate Chaining, the load factor α = N/M describes the average length of the M lists and it will determine the performance of Search(v) as we may have to explore α elements on average. As Remove(v) — also requires Search(v), its performance will be similar as Search(v). Insert(v) is clearly O(1).


If we can bound α to be a small constant (true if we know the expected largest N in our Hash Table application so that we can set up M accordingly), then all Search(v), Insert(v), and Remove(v) operations using Separate Chaining will be O(1).

X Esc
上一个 PgUp
下一个 PgDn

View the visualisation of Hash Table above.


In this visualization, we prevent insertion of duplicate keys.


Due to limited screen space, we limit the maximum Hash Table size to be M = 19.


The Hash Table is visualized horizontally like an array where index 0 is placed leftmost and index M-1 is placed rightmost but the details are different when we are visualizing Open Addressing versus Separate Chaining collision resolution techniques.


Discuss: What to modify in order to allow duplicate keys?

X Esc
上一个 PgUp
下一个 PgDn

There are three Open Addressing collision resolution techniques discussed in this visualization: Linear Probing (LP), Quadratic Probing (QP), and Double Hashing (DH).


For all three techniques, each Hash Table cell is displayed as a vertex with cell value of [0..99] displayed as the vertex label. Without loss of generality, we do not show any satellite data in this visualization as we concentrate only on the arrangement of the keys. We reserve value -1 to indicate an 'EMPTY cell' (visualized as a blank vertex) and -2 to indicate a 'DELETED cell' (visualized as a vertex with abbreviated label "DEL"). The cell indices ranging from [0..M-1] are shown as red label below each vertex.

X Esc
上一个 PgUp
下一个 PgDn

对于分离连接法(SC)冲突解决方法,第一行包含M双向链表M“H”(头)指针。
然后,每个双向链表我包含按任意顺序散列到 i 中的所有密钥。 在数学上,所有可以表示为i(mod M)的键都被散列到DLL i 中。 再次,我们不在此可视化中存储任何卫星数据。

X Esc
上一个 PgUp
下一个 PgDn
在线性探测冲突解决方法中,每当发生冲突时,我们会一次扫描一个索引,以查找下一个空/被删除的插槽(当我们到达最后一个插槽时再重新环绕)。
例如,我们假设我们以表格大小M = HT.length = 7的空的散列表 HT开始,如上所示,使用索引0到 M-1 = 7-1 = 6。请注意,7是素数。 (主)散列函数很简单,h(v)= v%M
此演练将向您展示使用线性探测作为冲突解决技术时Insert(v),Search(v)和Remove(v)操作所采取的步骤。
X Esc
上一个 PgUp
下一个 PgDn

Now click Insert([18,14,21]) — three individual insertions in one command.


Recap (to be shown after you click the button above).


Formally, we describe Linear Probing index i as i = (base+step*1) % M where base is the (primary) hash value of key v, i.e., h(v) and step is the Linear Probing step starting from 1.


Tips: To do a quick mental calculation of a (small) Integer V modulo M, we simply subtract V with the largest multiple of MV, e.g., 18%7 = 18-14 = 4, as 14 is the largest multiple of 7 that is ≤ 18.

X Esc
上一个 PgUp
下一个 PgDn
现在点击Insert([1,35])(在上一张幻灯片中插入的前三个值的顶部)。
回顾(在点击上面的按钮后显示)
X Esc
上一个 PgUp
下一个 PgDn

Now we illustrate Search(v) operation while using Linear Probing as collision resolution technique. The steps taken are very similar as with Insert(v) operation, i.e., we start from the (primary) hash key value and check if we have found v, otherwise we move one index forward at a time (wrapping around if necessary) and recheck on whether we have found v. We stop when we encounter an empty cell which implies that v is not in Hash Table at all (as earlier Insert(v) operation would have placed v there otherwise).


Now click Search(35) — you should see probing sequence [0,1,2,3 (key 35 found)].


Now click Search(8) — [1,2,3,4, 5 (empty cell, so key 8 is not found in the Hash Table)].

X Esc
上一个 PgUp
下一个 PgDn
现在我们来讨论移除(v)操作。
如果我们直接设置HT [i] = EMPTY单元,其中i是包含v的索引(在必要时进行线性探测之后),您是否意识到我们会导致问题? 为什么?

提示:查看以前的三张幻灯片,了解插入(v)和搜索(v)的行为。
X Esc
上一个 PgUp
下一个 PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
上一个 PgUp
下一个 PgDn
现在让我们看看完整的删除(v)。 如果我们在索引 i 处找到 v(必要时在线性探测之后),我们必须设置 HT [i] = DELETED(在此可视化中缩写为DEL),其中DEL是一个特殊符号(通常你只能在你的应用程序种没有使用过的使用一个特殊符号)指示该单元格可以在将来的搜索(v)需要时绕过,但可以被将来的插入(w)覆盖。 这种策略被称为懒惰删除。
现在点击Remove(21) - [0,1(找到关键21,我们设置H [1] = DEL)]。
然后请继续讨论下一张幻灯片。
X Esc
上一个 PgUp
下一个 PgDn
现在点击Search(35) - [0,1(绕过那个DELETED cell),2,3(找到键35)]。
想象一下,如果我们错误地设置H [1] = EMPTY,会发生什么。
X Esc
上一个 PgUp
下一个 PgDn
现在单击Insert(28) - 您应该看到探测序列[0,1(使用DEL符号找到一个单元格)],因此,实际上可以用新值覆盖,而不会影响将来搜索(v)的正确性。 因此,我们把28放在索引1中。
X Esc
上一个 PgUp
下一个 PgDn

Although we can resolve collision with Linear Probing, it is not the most effective way.


We define a cluster to be a collection of consecutive occupied slots. A cluster that covers the base address of a key is called the primary cluster of the key.


Now notice that Linear Probing can create large primary clusters that will increase the running time of Search(v)/Insert(v)/Remove(v) operations beyond the advertised O(1).


See an example above with M = 11 and we have inserted keys that are all 6 (modulo 11), i.e., all have remainder 6 when divided by 11. Now see how 'slow' Insert(94) is.

X Esc
上一个 PgUp
下一个 PgDn
线性探测的探测序列可以正式描述如下:

  h(v)//基地址
(h(v)+ 1 * 1)%M //第一个探测步骤,如果发生碰撞
(h(v)+ 2 * 1)%M //第二次探测步骤,如果仍有碰撞
(h(v)+ 3 * 1)%M //第三次探测步骤,如果仍有冲突
...
(h(v)+ k * 1)%M //第k个探测步骤等...

在Insert(v)过程中,如果发生冲突,但在Hash Table中仍有一个空(或DEL)时隙,我们肯定会在最多M个线性探测步骤后找到它。 而当我们这样做时,冲突将得到解决,但是键 v 的主集群会因此而扩展,未来的哈希表操作也会变得更慢。
由于相邻(但先前不相连)的群集的兼并(或组合),主群集的大小可能非常大。
X Esc
上一个 PgUp
下一个 PgDn
为了减少主聚类,我们可以将探针序列修改为:

  h(v)//基地址
(h(v)+ 1 * 1)%M //第一个探测步骤,如果发生碰撞
(h(v)+ 2 * 2)%M //第2次探测步骤,如果仍有冲突
(h(v)+ 3 * 3)%M //第三次探测步骤,如果仍有冲突
...
(h(v)+ k * k)%M //第k个探测步骤等...
就是这样,探针按照二次方跳转,根据需要环绕哈希表。
由于这是一种不同的二次探测,所以一个非常常见的错误就是:做H(V),(H(v)的1)%M,(H(v)的+ 1 + 4)%M,(H(V)+ 1 + 4 + 9)%M,...
X Esc
上一个 PgUp
下一个 PgDn
假设我们已经调用 Insert(18)和Insert(10)到大小为M = HT.length = 7的初始空Hash表中。由于18%7 = 4和10%7 = 3,18和3不会相互碰撞, 如上所示分别位于指数4和3中。
现在,我们点击Insert(38)
回顾(点击上面的按钮后显示)。
X Esc
上一个 PgUp
下一个 PgDn
删除(x)和搜索(y)操作的定义类似。 只是这次我们使用二次探测而不是线性探测。
例如,假设我们在上一张幻灯片之后调用了Remove(18),并且标记了HT [4] = DEL。 如果我们调用Search(38),我们将使用与上一张幻灯片相同的二次探测序列,但是会通过标记为DELETED的HT [4]。
X Esc
上一个 PgUp
下一个 PgDn
一目了然,二次探测(Quadratic Probing)能够快速解决我们之前对线性探测的主要聚类问题,但它是完美的碰撞解决技术吗?
尝试Insert([12,17])
你意识到刚刚发生了什么吗?
X Esc
上一个 PgUp
下一个 PgDn
我们可以很容易地插入12,因为h(12)= 12%7 = 5 以前是空的(见上文)。
但是,即使我们仍然有3个空槽,我们仍然存在主要问题:h(17)= 17%7 = 3已被键10占用,(3 + 1 * 1)%7 = 4已被占用 (3 + 2 * 2)%7 = 0已被键38占用,(3 + 3 * 3)%7 = 5已被键12占用,(3 + 4 * 4)%7 = 5 (3 + 5 * 5)%7 = 0再次被键38占据,(3 + 6 * 6)%7 = 4又被键18占据,(3 + 7 * 7)%7 = 3再次被键10占用,如果我们继续二次探测,它将永远循环...
尽管我们仍然有几个(3)空单元格,但我们无法将此新值17插入到哈希表中。
X Esc
上一个 PgUp
下一个 PgDn
如果α<0.5且M是质数(> 3),那么我们总是可以使用二次探测找到一个空槽。 回想一下:α是加载因子,M是散列表大小(HT.length)。

如果满足上述两个要求,我们可以证明包括基地址h(v)的第一M / 2二次探测指数都是不同且唯一的。

但除此之外没有这样的保证。 因此,如果我们想要使用二次探测,我们需要确保α<0.5(在这个可视化中没有强制执行,但是为了防止无限循环,我们在M步之后打破了循环)。
X Esc
上一个 PgUp
下一个 PgDn
我们将通过矛盾使用证据。 我们首先假设两个二次探测步骤:x和y,x!= y(假设x
h(v) + x*x = h(v) + y*y (mod M)
x*x = y*y (mod M)//从两边敲出h(v)
x*x - y*y = 0 (mod M) //将y * y移动到左边
(x-y)*(x+y) = 0 (mod M)//重新排列公式

现在,(x-y)或(x + y)必须等于零。 我们的假设是x != y,  那么(x-y)不能为0. 由于0≤ x
矛盾!
因此,第一个M / 2二次探测步骤不能产生相同的地址模M(如果我们将M设置为大于3的质数)。
讨论:我们可以使二次探测能够使用其他约50%的表格单元吗?
X Esc
上一个 PgUp
下一个 PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
上一个 PgUp
下一个 PgDn
在二次探测中,群集(clusters)沿着探测路径形成,而不是像线性探测那样围绕基地址形成。 这些群集称为次级群集(Secondary Clusters)。
由于在所有密钥的探测中使用相同的模式,所以形成次级群集。 请注意,如果两个不同的键具有相同的基址,则它们的二次探测序列将是相同的。
二次探测中的次级群集不如线性探测中的主群集那样糟糕,因为理论上散列函数理论上应该首先将键分散到不同的基地址∈[0..M-1]中。
X Esc
上一个 PgUp
下一个 PgDn
为了减少主要和次要群集,我们可以修改探针序列为:

  h(v)//基地址
(h(v)+ 1 * h2(v))%M //第一个探测步骤,如果有碰撞
(h(v)+ 2 * h2(v))%M //第2次探测步骤,如果仍有冲突
(h(v)+ 3 * h2(v))%M //第三次探测步骤,如果仍有冲突
...
(h(v)+ k * h2(v))%M //第k个探测步骤等...
就是这样,探测器根据第二个散列函数h2(v)的值跳转,根据需要环绕散列表。
X Esc
上一个 PgUp
下一个 PgDn
如果h2(v)= 1,则双散列(Double Hashing)的工作方式与线性探测(Linear Probing)完全相同。 所以我们通常希望h2(v)> 1来避免主聚类。
如果h2(v)= 0,那么Double Hashing不起作用的原因很明显,因为任何探测步数乘以0仍然是0,即我们在碰撞期间永远停留在基地址我们需要避免这种情况。
通常(对于整数键),h2(v)= M' - v%M'其中M'是一个小于M的质数。这使得h2(v)∈[1..M'],它足够多样 二次聚类。
二次散列函数的使用使得理论上难以产生主要或次要群集问题。
X Esc
上一个 PgUp
下一个 PgDn
点击Insert([35,42])插入35,然后插入42到上面的当前哈希表。
回顾(点击上面的按钮后显示)。
X Esc
上一个 PgUp
下一个 PgDn
删除(x)和搜索(y)操作的定义类似。 只是这次我们使用双散列(Double Hashing)而不是线性探测或二次探测。
例如,假设我们在上一张幻灯片之后调用了Remove(17),并且标记了HT [3] = DEL。 如果我们调用Search(35),我们将使用与前一张幻灯片相同的双哈希序列,但是会通过标记为DELETED的HT [3]。
X Esc
上一个 PgUp
下一个 PgDn
总之,一个好的开放寻址冲突解决技术需要:
  1. 如果它存在,总是找到一个空插槽,
  2. 最小化聚类(任何类型),
  3. 当两个不同的键碰撞时, 给出不同的探针序列,
  4. 快,O(1)。
讨论:双散列(Double Hashing)似乎符合。 但是...... 双散列策略是否足够灵活,可用作哈希表的默认库的实现? 让我们来看看...
X Esc
上一个 PgUp
下一个 PgDn

Try Insert([9,16,23,30,37,44]) to see how Insert(v) operation works if we use Separate Chaining as collision resolution technique. On such random insertions, the performance is good and each insertion is clearly O(1).


However if we try Insert([79,68,90]), notice that all Integers {79,68,90} are 2 (modulo 11) so all of them will be appended into the (back of) Doubly Linked List 2. We will have a long chain in that list. Note that due to the screen limitation, we limit the length of each Doubly Linked List to be 6.

X Esc
上一个 PgUp
下一个 PgDn

Try Search(35) to see that Search(v) can be made to run in O(1+α).


Try Remove(7) to see that Remove(v) can be made to run in O(1+α) too.


If α is large, Separate Chaining performance is not really O(1). However, if we roughly know the potential maximum number of keys N that our application will ever use, then we can set table size M accordingly such that α = N/M is a very low positive (floating-point) number, thereby making Separate Chaining performances all O(1).

X Esc
上一个 PgUp
下一个 PgDn
讨论:在所有这些解释之后,两种冲突解决方法的哪一种更好?
X Esc
上一个 PgUp
下一个 PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
上一个 PgUp
下一个 PgDn

您已经完成了这个哈希表数据结构的基本工作,我们鼓励您在探索模式中进一步探索。

但是,我们仍然会为您提供一些本节中概述的更有趣的哈希表的挑战。

X Esc
上一个 PgUp
下一个 PgDn
当负载因子α变高时,哈希表的性能会降低。 对于(标准)二次探测冲突解决方法,当哈希表的α> 0.5时,插入可能失败。
如果发生这种情况,我们可以重新散列(rehash)。 我们用一个新的散列函数构建另一个大约两倍的散列表。 我们遍历原始哈希表中的所有键,重新计算新的哈希值,然后将键(及其卫星数据)重新插入新的更大的哈希表中,最后删除较早的较小哈希表。
一个经验法则是,如果使用开放寻址,并且当α>小的常数(根据需要接近1.0),如果使用单独链接,当α≥0.5时重新散列。
如果我们知道可能的最大键数,我们可以始终将α作为一个低的数字。
X Esc
上一个 PgUp
下一个 PgDn

However, if you ever need to implement a Hash Table in C++ or Java and your keys are either Integers or Strings, you can use the built-in C++ STL or Java API. They already have good built-in implementation of default hash functions for Integers or Strings.


See C++ STL unordered_map, unordered_set or Java HashMap, HashSet.
Notice that multimap/multiset implementations also exist (duplicate keys are allowed).


For Python, we can use dict/set. For OCaml, we can use Hashtbl.


However, here is our take of a simple Separate Chaining implementation: HashTableDemo.cpp|py|java (not generic enough though).

X Esc
上一个 PgUp
下一个 PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
上一个 PgUp
下一个 PgDn

如果(整数或字符串)键只需要映射到卫星数据,则散列表是实现Table ADT的非常好的数据结构,对于Search(v),Insert(v)和Remove( v)如果哈希表设置正确,则执行时间复杂度为O(1)。
但是,如果我们需要更多地使用 键,我们可能需要使用另一种数据结构。

X Esc
上一个 PgUp
下一个 PgDn

关于这个数据结构的一些更有趣的问题,请在哈希表培训模块上练习(不需要登录,但是只需短暂且中等难度设置)。
但是,对于注册用户,您应该登录,然后转到主要培训页面以正式清除此模块,这些成果将记录在您的用户帐户中。

X Esc
上一个 PgUp
下一个 PgDn

尝试解决一些基本的编程问题,就很可能需要使用哈希表(特别是如果输入大小很大的时候):

  1. Kattis - cd(输入已经被排序,因此存在替代的非哈希表解决方案; 如果未对输入进行排序,则在哈希表的帮助下最好解决此组交集问题),
  2. Kattis - oddmanout(我们可以将较大的邀请代码映射到较小范围的整数;这是整数散列(大范围)的一种做法),
  3. Kattis - whatdoesthefoxsay(我们把不是狐狸的声音放入无序集合;这是散列字符串的做法)。

X Esc
上一个 PgUp
下一个 PgDn
当操作进行时,状态面板将会有每个步骤的描述。
X Esc
上一个 PgUp
下一个 PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
上一个 PgUp
下一个 PgDn
使用用户控件控制动画!可用的快捷键有:
空格键:绘制/停止/重绘
左/右箭头:上一步/下一步
-/+:减缓/增加速度

X Esc
上一个 PgUp
下一个 PgDn

Return to 'Exploration Mode' to start exploring!


Note that if you notice any bug in this visualization or if you want to request for a new visualization feature, do not hesitate to drop an email to the project leader: Dr Steven Halim via his email address: stevenhalim at gmail dot com.

X Esc
上一个 PgUp

创建

插入(v)

移除(v)

>

创建空的哈希表大小为 =

执行

v =

执行

v =

执行

关于 团队 使用条款

关于

VisuAlgo在2011年由Steven Halim博士概念化,作为一个工具,帮助他的学生更好地理解数据结构和算法,让他们自己和自己的步伐学习基础。
VisuAlgo包含许多高级算法,这些算法在Steven Halim博士的书(“竞争规划”,与他的兄弟Felix Halim博士合作)和其他书中讨论。今天,一些高级算法的可视化/动画只能在VisuAlgo中找到。
虽然专门为新加坡国立大学(NUS)学生采取各种数据结构和算法类(例如CS1010,CS1020,CS2010,CS2020,CS3230和CS3230),作为在线学习的倡导者,我们希望世界各地的好奇心发现这些可视化也很有用。
VisuAlgo不是从一开始就设计为在小触摸屏(例如智能手机)上工作良好,因为需要满足许多复杂的算法可视化,需要大量的像素和点击并拖动手势进行交互。一个令人尊敬的用户体验的最低屏幕分辨率为1024x768,并且只有着陆页相对适合移动设备。
VisuAlgo是一个正在进行的项目,更复杂的可视化仍在开发中。
最令人兴奋的发展是自动问题生成器和验证器(在线测验系统),允许学生测试他们的基本数据结构和算法的知识。这些问题是通过一些规则随机生成的,学生的答案会在提交给我们的评分服务器后立即自动分级。这个在线测验系统,当它被更多的世界各地的CS教师采用,应该技术上消除许多大学的典型计算机科学考试手动基本数据结构和算法问题。通过在通过在线测验时设置小(但非零)的重量,CS教练可以(显着地)增加他/她的学生掌握这些基本问题,因为学生具有几乎无限数量的可以立即被验证的训练问题他们参加在线测验。培训模式目前包含12个可视化模块的问题。我们将很快添加剩余的8个可视化模块,以便VisuAlgo中的每个可视化模块都有在线测验组件。
另一个积极的发展分支是VisuAlgo的国际化子项目。我们要为VisuAlgo系统中出现的所有英语文本准备一个CS术语的数据库。这是一个很大的任务,需要众包。一旦系统准备就绪,我们将邀请VisuAlgo游客贡献,特别是如果你不是英语母语者。目前,我们还以各种语言写了有关VisuAlgo的公共注释:
zh, id, kr, vn, th.

团队

项目领导和顾问(2011年7月至今)
Dr Steven Halim, Senior Lecturer, School of Computing (SoC), National University of Singapore (NUS)
Dr Felix Halim, Software Engineer, Google (Mountain View)

本科生研究人员 1 (Jul 2011-Apr 2012)
Koh Zi Chun, Victor Loh Bo Huai

最后一年项目/ UROP学生 1 (Jul 2012-Dec 2013)
Phan Thi Quynh Trang, Peter Phandi, Albert Millardo Tjindradinata, Nguyen Hoang Duy

最后一年项目/ UROP学生 2 (Jun 2013-Apr 2014)
Rose Marie Tan Zhao Yun, Ivan Reinaldo

本科生研究人员 2 (May 2014-Jul 2014)
Jonathan Irvin Gunawan, Nathan Azaria, Ian Leow Tze Wei, Nguyen Viet Dung, Nguyen Khac Tung, Steven Kester Yuwono, Cao Shengze, Mohan Jishnu

最后一年项目/ UROP学生 3 (Jun 2014-Apr 2015)
Erin Teo Yi Ling, Wang Zi

最后一年项目/ UROP学生 4 (Jun 2016-Dec 2017)
Truong Ngoc Khanh, John Kevin Tjahjadi, Gabriella Michelle, Muhammad Rais Fathin Mudzakir

List of translators who have contributed ≥100 translations can be found at statistics page.

致谢
这个项目是由来自NUS教学与学习发展中心(CDTL)的慷慨的教学增进赠款提供的。

使用条款

VisuAlgo是地球上的计算机科学社区免费。如果你喜欢VisuAlgo,我们对你的唯一的要求就是通过你知道的方式,比如:Facebook、Twitter、课程网页、博客评论、电子邮件等告诉其他计算机方面的学生/教师:VisuAlgo网站的神奇存在
如果您是数据结构和算法学生/教师,您可以直接将此网站用于您的课程。如果你从这个网站拍摄截图(视频),你可以使用屏幕截图(视频)在其他地方,只要你引用本网站的网址(http://visualgo.net)或出现在下面的出版物列表中作为参考。但是,您不能下载VisuAlgo(客户端)文件并将其托管在您自己的网站上,因为它是剽窃。到目前为止,我们不允许其他人分叉这个项目并创建VisuAlgo的变体。使用(客户端)的VisuAlgo的离线副本作为您的个人使用是很允许的。
请注意,VisuAlgo的在线测验组件本质上具有沉重的服务器端组件,并且没有简单的方法来在本地保存服务器端脚本和数据库。目前,公众只能使用“培训模式”来访问这些在线测验系统。目前,“测试模式”是一个更受控制的环境,用于使用这些随机生成的问题和自动验证在NUS的实际检查。其他感兴趣的CS教练应该联系史蒂文如果你想尝试这样的“测试模式”。
出版物名单
这项工作在2012年ACM ICPC世界总决赛(波兰,华沙)和IOI 2012年IOI大会(意大利Sirmione-Montichiari)的CLI讲习班上进行了简要介绍。您可以点击此链接阅读我们2012年关于这个系统的文章(它在2012年还没有被称为VisuAlgo)。
这项工作主要由我过去的学生完成。最近的最后报告是:Erin,Wang Zi,Rose,Ivan。
错误申报或请求添加新功能
VisuAlgo不是一个完成的项目。 Steven Halim博士仍在积极改进VisuAlgo。如果您在使用VisuAlgo并在我们的可视化页面/在线测验工具中发现错误,或者如果您想要求添加新功能,请联系Dr Steven Halim博士。他的联系邮箱是他的名字加谷歌邮箱后缀:StevenHalim@gmail.com。