Hash Table is a data structure to map key to values (also called Table or Map Abstract Data Type/ADT). It uses a **hash function** to map large or even non-Integer keys into a small range of Integer indices (typically [0..hash_table_size-1]).

The probability of two distinct keys colliding into the same index is __relatively high__ and each of this potential collision needs to be resolved to maintain data integrity.

There are several collision resolution strategies that will be highlighted in this visualization: Open Addressing (Linear Probing, Quadratic Probing, and Double Hashing) and Closed Addressing (Separate Chaining). Try clicking

for a sample animation of searching a value in a Hash Table using Separate Chaining technique.**Remarks**: By default, we show e-Lecture Mode for first time (or non logged-in) visitor.

If you are an NUS student and a repeat visitor, please __login__.

散列是一种算法（通过散列函数），将大型可变长度数据集（称为键，不一定是整数）映射为固定长度的较小整数数据集。

哈希表是一种数据结构，它使用哈希函数有效地将键映射到值（表或地图ADT），以便进行高效的搜索/检索，插入和/或删除。

散列表广泛应用于多种计算机软件中，特别是__关联数组__，数据库索引，缓存和集合。

在这个电子讲座中，我们将深入讨论表ADT，__哈希表__的基本概念，讨论__哈希函数__，然后再讨论__哈希表__数据结构本身的细节。

Pro-tip 1: Since you are not __logged-in__, you may be a first time visitor (or not an NUS student) who are not aware of the following keyboard shortcuts to navigate this e-Lecture mode: **[PageDown]**/**[PageUp]** to go to the next/previous slide, respectively, (and if the drop-down box is highlighted, you can also use **[→ or ↓/← or ↑]** to do the same),and **[Esc]** to toggle between this e-Lecture mode and exploration mode.

A Table ADT must support **at least** the following three operations as efficient as possible:

- Search(v) — determine if
**v**exists in the ADT or not, - Insert(v) — insert
**v**into the ADT, - Remove(v) — remove
**v**from the ADT.

Hash Table is one possible good implementation for this Table ADT (the other one is __this__).

PS1: For two weaker implementations of Table ADT, you can click the respective link: __unsorted array__ or a __sorted array__ to read the detailed discussions.

PS2: In live class, you may want to compare the requirements of Table ADT vs __List ADT__.

Pro-tip 2: We designed this visualization and this e-Lecture mode to look good on 1366x768 resolution **or larger** (typical modern laptop resolution in 2021). We recommend using Google Chrome to access VisuAlgo. Go to full screen mode (**F11**) to enjoy this setup. However, you can use zoom-in (**Ctrl +**) or zoom-out (**Ctrl -**) to calibrate this.

When the range of the **Integer** keys is **small**, e.g., [0..**M**-1], we can use an initially empty (Boolean) array **A** of size **M** and implement the following Table ADT operations **directly**:

- Search(v): Check if
**A[v]**is true (filled) or false (empty), - Insert(v): Set
**A[v]**to be true (filled), - Remove(v): Set
**A[v]**to be false (empty).

That's it, we use the small Integer key itself to determine the address in array **A**, hence the name **Direct Addressing**. It is clear that all three major Table ADT operations are O(**1**).

PS: This idea is also used elsewhere, e.g., in __Counting Sort__.

Pro-tip 3: Other than using the typical media UI at the bottom of the page, you can also control the animation playback using keyboard shortcuts (in Exploration Mode): **Spacebar** to play/pause/replay the animation, **←**/**→** to step the animation backwards/forwards, respectively, and **-**/**+** to decrease/increase the animation speed, respectively.

In Singapore (__as of Sep 2021__), bus routes are numbered from [2..991].

Not all integers between [2..991] are currently used, e.g., there is no bus route 989 — Search(989) should return false. A new bus route **x** may be introduced, i.e., Insert(x) or an existing bus route **y** may be discontinued, i.e., Remove(y).

As the range of possible bus routes is **small**, to record the data whether a bus route number exists or not, we can use a DAT with a Boolean array of size 1000.

Discussion: In real life class, we may discuss on why we use 1000 instead of 991 (or 992).

Notice that we can always add __ satellite data__ instead of just using a Boolean array to record the existence of the keys.

For example, we can use an **associative** String array **A** instead to map a bus route number to its operator name, e.g.,

A[2] = "Go-Ahead Singapore",

A[10] = "SBS Transit",

A[183] = "Tower Transit Singapore",

A[188] = "SMRT Buses", etc.

Discussion: Can you think of a few other real-life DAT examples?

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various __flipped classrooms__ in NUS.

**If you are really a CS lecturer (or an IT teacher)** (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (**show your University staff profile/relevant proof to Steven**) for Steven to manually activate this CS lecturer-only feature for you.

FAQ: This feature will **NOT** be given to anyone else who is not a CS lecturer.

The keys must be (or can be easily mapped to) **non-negative Integer** values. Note that basic DAT has problem in the full version of the example in the previous few slides as there are actually variations of bus route numbers in Singapore, e.g., 96B, 151A, NR10, etc.

The range of keys must be **small**.

The memory usage will be (insanely) large if we have (insanely) large range.

The keys must be dense, i.e., not many gaps in the key values.

DAT will contain too many empty (and wasted) cells otherwise.

We will overcome these restrictions with hashing.

Using hashing, we can:

- Map (some)
**non-Integer**keys (e.g., Strings) to Integers keys, - Map
**large**Integers to**smaller**Integers,

For example, we have **N** = 400 Singapore phone numbers (Singapore phone number has 8 digits, so there are up to 10^8 = 100M possible phone numbers in Singapore).

Instead of using a DAT and use a **gigantic** array up to size **M** = 100 Million, we can use the following simple hash function **h(v) = v%997**.

This way, we map 8 digits phone numbers **6675 2378** and **6874 4483** into up to 3 digits **h(6675 2378) = 237** and **h(6874 4483) = 336**, respectively. Therefore, we only need to prepare array of size **M** = 997 (or just simplify to 1000) instead of **M** = 100 Million.

- 搜索（v）：检查
**A [h（v）]！= -1**（假设**v≥0**，我们对空单元使用-1） - 插入（v）：设置
**A [h（v）] = v**（我们把**v**散列到**h（v）**中，所以我们需要以某种方式记录键**v**）， - 删除（v）：设置
**A [h（v）] = -1**--- 将进一步阐述。

如果我们有映射到卫星数据的键，并且我们也想记录原始键，我们可以使用如下一对（整数，卫星数据类型）数组实现哈希表：

- 搜索（v）：返回
**A [h（v）**]，它是**一对**（**v，卫星数据**），可能是空的， - 插入（v，卫星数据）：设置
**A [h（v）] =对（v，卫星数据）**， - 删除（v）：设置
**A [h（v）] =（空对）**- 将进一步详细阐述。

但是，现在你应该注意到有些地方是不完整的......

A hash function may, and quite likely, map **different keys (Integer or not)** into the **same Integer slot**, i.e., a **many-to-one** mapping instead of **one-to-one** mapping.

For example, **h(6675 2378) = 237** from __three slides earlier__ and if we want to insert another phone number **6675 4372**, we will have a problem as **h(6675 4372) = 237** too.

This situation is called a **collision**, i.e., two (or more) keys have the **same hash value**.

The __Birthday (von Mises) Paradox__ asks: 'How many people (number of keys) must be in a room (Hash Table) of size 365 seats (cells) before the probability that some person **share a birthday** (collision, two keys are hashed to the same cell), ignoring the leap years (i.e., all years have 365 days), becomes > 50 percent (i.e., more likely than not)?'

The answer, which maybe surprising for some of us, is .

Let's do some calculation.

Let **Q(n)** be the probability of **unique** birthday for **n** people in a room.**Q(n) = 365/365 × 364/365 × 363/365 × ... × (365-n+1)/365**,

i.e., the first person's birthday can be any of the 365 days, the second person's birthday can be any of the 365 days except the first person's birthday, and so on.

Let **P(n)** be the probability of **same birthday** (collision) for **n** people in a room.**P(n) = 1-Q(n)**.

We compute that ** P(23) = 0.507 > 0.5 (50%)**.

Thus, we only need **23 people** (a small amount of keys) in the room (Hash Table) of size 365 seats (cells) for a (more than) 50% chance collision to happen (the birthday of two different people in that room is one of 365 days/slots).

问题1：我们已经看到了一个简单的散列函数，如__电话号码示例__中使用的**h（v）= v％997**，它将大范围的整数键映射到整数键的较小范围内，但非整数键的情况如何？ 如何有效地做这样的散列？

问题2：我们已经看到，通过将大范围散列或映射到更小范围，很可能会发生碰撞。 如何处理它们？

How to create a good hash function with these desirable properties?

- Fast to compute, i.e., in O(
**1**), - Uses as minimum slots/Hash Table size
**M**as possible, - Scatter the keys into different base addresses as uniformly as possible ∈ [0..
**M**-1], - Experience as minimum collisions as possible.

**M**的散列表，其中键用于标识卫星数据，并且使用特定的散列函数来计算散列值。

使用散列函数从键

**v**计算键

**v**的散列值/散列码以获得范围从0到M-1的整数。 该散列值用作卫星数据的散列表条目的基址/归位索引/地址。

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various __flipped classrooms__ in NUS.

**If you are really a CS lecturer (or an IT teacher)** (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (**show your University staff profile/relevant proof to Steven**) for Steven to manually activate this CS lecturer-only feature for you.

FAQ: This feature will **NOT** be given to anyone else who is not a CS lecturer.

Before discussing the reality, let's discuss the ideal case: __ perfect hash functions__.

A perfect hash function is a **one-to-one** mapping between keys and hash values, i.e., no collision at all. It is possible if all keys are known beforehand. For example, a compiler/interpreter search for reserved keywords. However, such cases are rare.

A minimal perfect hash function is achieved when the table size is the same as the number of keywords supplied. This case is even rarer.

If you are interested, you can explore __GNU gperf__, a freely available perfect hash function generator written in C++ that automatically constructs perfect functions (a C++ program) from a user supplied list of keywords.

People has tried various ways to hash a large range of Integers into a smaller range of Integers as uniformly as possible. In this e-Lecture, we jump directly to one of the best and most popular version: **h(v) = v%M**, i.e., map **v** into Hash Table of size **M** slots. The (%) is a modulo operator that gives the remainder after division. This is clearly fast, i.e., O(**1**) assuming that **v** does not exceed the natural Integer data type limit.

The Hash Table size **M** is set to be a reasonably large prime not near a power of 2, about 2+ times larger than the expected number of keys **N** that will ever be used in the Hash Table. This way, the __load factor__ α = N/M < 0.5 — we shall see later that having low load factor, thereby sacrificing empty spaces, help improving Hash Table performance.

Discuss: What if we set **M** to be a power of 10 (decimal) or power of 2 (binary)?

The content of this interesting slide (the answer of the usually intriguing discussion point from the earlier slide) is hidden and only available for legitimate CS lecturer worldwide. This mechanism is used in the various __flipped classrooms__ in NUS.

**If you are really a CS lecturer (or an IT teacher)** (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (**show your University staff profile/relevant proof to Steven**) for Steven to manually activate this CS lecturer-only feature for you.

FAQ: This feature will **NOT** be given to anyone else who is not a CS lecturer.

People has also tried various ways to hash Strings into a small range of Integers as uniformly as possible. In this e-Lecture, we jump directly to one of the best and most popular version, shown below:

int hash_function(string v) { // assumption 1: v uses ['A'..'Z'] only

int sum = 0; // assumption 2: v is a short string

for (auto& c : v) // for each character c in v

sum = ((sum*26)%M + (c-'A'+1))%M; // M is table size

return sum;

}

Discussion: In real life class, discuss the components of the hash function above, e.g., why loop through all characters?, will that be slower than O(**1**)?, why multiply with 26?, what if the string v uses more than just UPPERCASE chars?, etc.

__flipped classrooms__ in NUS.

**If you are really a CS lecturer (or an IT teacher)** (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (**show your University staff profile/relevant proof to Steven**) for Steven to manually activate this CS lecturer-only feature for you.

FAQ: This feature will **NOT** be given to anyone else who is not a CS lecturer.

双倍散列： i=(基础+步骤*第二级)%HT_size

要在三种模式之间切换，请点击相应的标题。

假设：

**M**= HT.length =当前散列表大小，base =（key％HT.length），step =当前探测步骤，secondary = smaller_prime - key％smaller_prime（避免零 - 很快就会阐述）我们很快就会看到 三种模式的探测序列为：线性探测：i =（基础+步骤* 1）％M，二次探测：i =（基础+步骤*步骤）％M和双散列：i =（基础+步骤 *次要）％M.

Separate Chaining (SC) collision resolution technique is simple. We use **M** copies of auxiliary data structures, usually __Doubly Linked Lists__. If two keys **a** and **b** both have the same hash value **i**, both will be appended to the (front/back) of Doubly Linked List **i** (in this visualization, we append to the back in O(**1**) with help of tail pointer). That's it, where the keys will be slotted in is completely dependent on the hash function itself, hence we also call Separate Chaining as Closed Addressing collision resolution technique.

If we use Separate Chaining, the load factor α = **N/M** describes the average length of the **M** lists and it will determine the performance of Search(v) as we may have to explore α elements on average. As Remove(v) — also requires Search(v), its performance will be similar as Search(v). Insert(v) is clearly O(**1**).

If we can bound α to be a small constant (true if we know the expected largest **N** in our Hash Table application so that we can set up **M** accordingly), then all Search(v), Insert(v), and Remove(v) operations using Separate Chaining will be O(**1**).

View the visualisation of Hash Table above.

In this visualization, we prevent insertion of duplicate keys.

Due to limited screen space, we limit the maximum Hash Table size to be **M = 19**.

The Hash Table is visualized horizontally like an array where index 0 is placed leftmost and index **M**-1 is placed rightmost but the details are different when we are visualizing Open Addressing versus Separate Chaining collision resolution techniques.

Discuss: What to modify in order to allow duplicate keys?

There are three Open Addressing collision resolution techniques discussed in this visualization: Linear Probing (LP), Quadratic Probing (QP), and Double Hashing (DH).

For all three techniques, each Hash Table cell is displayed as a vertex with cell value of [0..99] displayed as the vertex label. Without loss of generality, we do not show any satellite data in this visualization as we concentrate only on the arrangement of the keys. We reserve value -1 to indicate an 'EMPTY cell' (visualized as a blank vertex) and -2 to indicate a 'DELETED cell' (visualized as a vertex with abbreviated label "DEL"). The cell indices ranging from [0..**M**-1] are shown as red label below each vertex.

对于分离连接法（SC）冲突解决方法，第一行包含**M**个__双向链表__的**M**“H”（头）指针。

然后，每个双向链表我包含按任意顺序散列到 **i **中的所有密钥。 在数学上，所有可以表示为**i**（mod **M**）的键都被散列到DLL **i **中。 再次，我们不在此可视化中存储任何卫星数据。

**下一个空/被删除的插槽**（当我们到达最后一个插槽时再重新环绕）。

例如，我们假设我们以表格大小M = HT.length = 7的空的散列表

**HT**开始，如上所示，使用索引0到

**M**-1 = 7-1 = 6。请注意，7是素数。 （主）散列函数很简单，

**h（v）= v％M**。

此演练将向您展示使用线性探测作为冲突解决技术时Insert（v），Search（v）和Remove（v）操作所采取的步骤。

Now click

— three individual insertions in one command.Recap (to be shown after you click the button above).

Formally, we describe __Linear Probing index i__ as

**i = (base+step*1) % M**where

**base**is the (primary) hash value of key

**v**, i.e.,

**h(v)**and

**step**is the Linear Probing step starting from 1.

Tips: To do a quick mental calculation of a (small) Integer **V** modulo **M**, we simply subtract **V** with the largest multiple of **M** ≤ **V**, e.g., 18%7 = 18-14 = 4, as 14 is the largest multiple of 7 that is ≤ 18.

回顾（在点击上面的按钮后显示）

Now we illustrate Search(v) operation while using Linear Probing as collision resolution technique. The steps taken are very similar as with Insert(v) operation, i.e., we start from the (primary) hash key value and check if we have found **v**, otherwise we move one index forward at a time (wrapping around if necessary) and recheck on whether we have found **v**. We stop when we encounter an empty cell which implies that **v** is not in Hash Table at all (as earlier Insert(v) operation would have placed **v** there otherwise).

Now click

— you should see probing sequence [0,1,2,3 (key 35 found)].Now click

— [1,2,3,4, 5 (empty cell, so key 8 is not found in the Hash Table)].如果我们直接设置

**HT [i] = EMPTY**单元，其中

**i**是包含

**v**的索引（在必要时进行线性探测之后），您是否意识到我们会导致问题？ 为什么？

提示：查看以前的三张幻灯片，了解插入（v）和搜索（v）的行为。

__flipped classrooms__ in NUS.

**If you are really a CS lecturer (or an IT teacher)** (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (**show your University staff profile/relevant proof to Steven**) for Steven to manually activate this CS lecturer-only feature for you.

FAQ: This feature will **NOT** be given to anyone else who is not a CS lecturer.

**i**处找到

**v**（必要时在线性探测之后），我们必须设置

**HT [i] = DELETED**（在此可视化中缩写为DEL），其中DEL是一个特殊符号（通常你只能在你的应用程序种没有使用过的使用一个特殊符号）指示该单元格可以在将来的搜索（v）需要时绕过，但可以被将来的插入（w）覆盖。 这种策略被称为懒惰删除。

现在点击 - [0,1（找到关键21，我们设置H [1] = DEL）]。

然后请继续讨论下一张幻灯片。

想象一下，如果我们错误地设置H [1] = EMPTY，会发生什么。

Although we can resolve collision with Linear Probing, it is not the most effective way.

We define a **cluster** to be a collection of consecutive occupied slots. A cluster that covers the base address of a key is called the **primary cluster** of the key.

Now notice that Linear Probing can create large primary clusters that will increase the running time of Search(v)/Insert(v)/Remove(v) operations beyond the advertised O(**1**).

See an example above with **M** = 11 and we have inserted 8 keys that are all 6 (modulo 11), i.e., all have remainder 6 when divided by 11. Now see how 'slow' (the 9th key) is.

The probe sequence of Linear Probing can be formally described as follows:

h(v) // base address

(h(v) + 1*1) % M // 1st probing step if there is a collision

(h(v) + 2*1) % M // 2nd probing step if there is still a collision

(h(v) + 3*1) % M // 3rd probing step if there is still a collision

...

(h(v) + k*1) % M // k-th probing step, etc...

During Insert(v), if there is a collision but there is an empty (or DEL) slot remains in the Hash Table, we are sure to find it after at most **M** Linear Probing steps, i.e., in O(**M**). And when we do, the collision will be resolved, but the primary cluster of the key **v** is expanded as a result and future Hash Table operations will get slower too. Try the slow on the same Hash Table as in the previous slide but with many DEL markers.

In the previous slide (Primary Clustering, Part 1), we break the assumption that the hash function should uniformly distribute keys around [0..**M**-1]. In the next example, we will show that the problem of primary clustering can still happen even if the hash function distribute the keys somewhat uniformly around [0..**M**-1].

On screen, you see **M** = 11 where the first 4 keys have been inserted {11, 2, 4, 6}. If we then insert these next 4 keys {22, 24, 26, 28}, these keys will initially collide with the cells that already contain {11, 2, 4, 6}, have "short" one-step probes, then by doing so "plug" the empty cells and accidentally annex (or combine) those neighboring (but previously disjointed) clusters into a long primary cluster. So the next insertion of a key {33} that lands at (the beginning of) this long primary cluster will end up performing almost O(M) probing steps just to find an empty cell. Try .

h（v）//基地址

（h（v）+ 1 * 1）％M //第一个探测步骤，如果发生碰撞

（h（v）+ 2 * 2）％M //第2次探测步骤，如果仍有冲突

（h（v）+ 3 * 3）％M //第三次探测步骤，如果仍有冲突

...

（h（v）+ k * k）％M //第k个探测步骤等...

就是这样，探针按照二次方跳转，根据需要环绕哈希表。

由于这是一种不同的二次探测，所以一个非常常见的错误就是：做H（V），（H（v）的1）％M，（H（v）的+ 1 + 4）％M，（H（V）+ 1 + 4 + 9）％M，...

现在，我们点击 。

回顾（点击上面的按钮后显示）。

例如，假设我们在上一张幻灯片之后调用了Remove（18），并且标记了HT [4] = DEL。 如果我们调用 ，我们将使用与上一张幻灯片相同的二次探测序列，但是会通过标记为DELETED的HT [4]。

尝试 。

你意识到刚刚发生了什么吗？

但是，即使我们仍然有3个空槽，我们仍然存在主要问题：h（17）= 17％7 = 3已被键10占用，（3 + 1 * 1）％7 = 4已被占用 （3 + 2 * 2）％7 = 0已被键38占用，（3 + 3 * 3）％7 = 5已被键12占用，（3 + 4 * 4）％7 = 5 （3 + 5 * 5）％7 = 0再次被键38占据，（3 + 6 * 6）％7 = 4又被键18占据，（3 + 7 * 7）％7 = 3再次被键10占用，如果我们继续二次探测，它将

**永远循环**...

尽管我们仍然有几个（3）空单元格，但我们无法将此新值17插入到哈希表中。

如果满足上述两个要求，我们可以证明包括基地址h（v）的第一M / 2二次探测指数都是不同且唯一的。

但除此之外没有这样的保证。 因此，如果我们想要使用二次探测，我们需要确保α<0.5（在这个可视化中没有强制执行，但是为了防止无限循环，我们在M步之后打破了循环）。

h(v) + x*x = h(v) + y*y (mod M)

x*x - y*y = 0 (mod M) //将y * y移动到左边

现在，（x-y）或（x + y）必须等于零。 我们的假设是

`x != y`, 那么（x-y）不能为0. 由于0≤ x

因此，第一个M / 2二次探测步骤不能产生相同的地址模M（如果我们将M设置为大于3的质数）。

讨论：我们可以使二次探测能够使用其他约50％的表格单元吗？

__flipped classrooms__ in NUS.

**If you are really a CS lecturer (or an IT teacher)** (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (**show your University staff profile/relevant proof to Steven**) for Steven to manually activate this CS lecturer-only feature for you.

FAQ: This feature will **NOT** be given to anyone else who is not a CS lecturer.

In Quadratic Probing, clusters are formed along the path of probing, instead of around the base address like in Linear Probing. These clusters are called **Secondary Clusters** and it is 'less visible' compared to the Primary Clusters that plagued the Linear Probing.

Secondary clusters are formed as a result of using the same pattern in probing by colliding keys, i.e., if two distinct keys have the same base address, their Quadratic Probing sequences are going to be the same.

To illustrate this, see the screen with **M** = 19. We have populated this Hash Table with only 7 keys (so load factor α = 7/19 ≥ 0.5) and the Hash Table looks 'sparse enough' (no visibly big primary cluster). However, if we then insert , despite the fact that there are many (19-7 = 12) empty cells and 19 != 38 (different keys that ends up hashed into index 0), we end up doing 7 probing steps along this 'less visible' secondary cluster.

Secondary clustering in Quadratic Probing is not as bad as primary clustering in Linear Probing as a good hash function should theoretically disperse the keys into different base addresses ∈ [0..**M**-1] in the first place.

h（v）//基地址

（h（v）+ 1 * h2（v））％M //第一个探测步骤，如果有碰撞

（h（v）+ 2 * h2（v））％M //第2次探测步骤，如果仍有冲突

（h（v）+ 3 * h2（v））％M //第三次探测步骤，如果仍有冲突

...

（h（v）+ k * h2（v））％M //第k个探测步骤等...

就是这样，探测器根据

**第二个散列函数**h2（v）的值跳转，根据需要环绕散列表。

如果h2（v）= 0，那么Double Hashing不起作用的原因很明显，因为任何探测步数乘以0仍然是0，即我们在碰撞期间永远停留在基地址我们需要避免这种情况。

通常（对于整数键），h2（v）= M' - v％M'其中M'是一个小于M的质数。这使得h2（v）∈[1..M']，它足够多样 二次聚类。

二次散列函数的使用使得理论上难以产生主要或次要群集问题。

回顾（点击上面的按钮后显示）。

例如，假设我们在上一张幻灯片之后调用了Remove（17），并且标记了HT [3] = DEL。 如果我们调用 ，我们将使用与前一张幻灯片相同的双哈希序列，但是会通过标记为DELETED的HT [3]。

In summary, a good Open Addressing collision resolution technique needs to:

- Always find an empty slot if it exists,
- Minimize clustering (of any kind),
- Give different probe sequences when 2 different keys collide,
- Fast, O(
**1**).

Now, let's see __the same test case that plagues Quadratic Probing__ earlier. Now try again. Although h(19) = h(38) = 0 and their collide, their probing steps are not the same: h2(19) = 17-19%17 = 15 is not the same as h2(38) = 17-38%17 = 13.

Discussion: Double Hashing seems to fit the bill. But... Is Double Hashing strategy flexible enough to be used as the default library implementation of a Hash Table? Let's see...

Try **1**).

However if we try

, notice that all Integers {79,68,90} are 2 (modulo 11) so all of them will be appended into the (back of) Doubly Linked List 2. We will have a long chain in that list. Note that due to the screen limitation, we limit the length of each Doubly Linked List to be 6.Try **1+α**).

Try **1+α**) too.

If **α** is large, Separate Chaining performance is not really O(**1**). However, if we roughly know the potential maximum number of keys **N** that our application will ever use, then we can set table size **M** accordingly such that **α = N/M** is a very low positive (floating-point) number, thereby making Separate Chaining performances to be expected O(**1**).

__flipped classrooms__ in NUS.

**If you are really a CS lecturer (or an IT teacher)** (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (**show your University staff profile/relevant proof to Steven**) for Steven to manually activate this CS lecturer-only feature for you.

FAQ: This feature will **NOT** be given to anyone else who is not a CS lecturer.

您已经完成了这个哈希表数据结构的基本工作，我们鼓励您在**探索模式**中进一步探索。

但是，我们仍然会为您提供一些本节中概述的更有趣的哈希表的挑战。

如果发生这种情况，我们可以重新散列（rehash）。 我们用一个新的散列函数构建另一个大约两倍的散列表。 我们遍历原始哈希表中的所有键，重新计算新的哈希值，然后将键（及其卫星数据）重新插入新的更大的哈希表中，最后删除较早的较小哈希表。

一个经验法则是，如果使用开放寻址，并且当α>小的常数（根据需要接近1.0），如果使用单独链接，当α≥0.5时重新散列。

如果我们知道可能的最大键数，我们可以始终将α作为一个低的数字。

However, if you ever need to implement a Hash Table in C++, Python, or Java, and your keys are either Integers or Strings, you can use the built-in C++ STL, Python standard library, or Java API, respectively. They already have good built-in implementation of default hash functions for Integers or Strings.

See C++ STL __std::unordered_map__, __std::unordered_set__, Python __dict__, __set__, or Java __HashMap__, __HashSet__.

For C++, note that the std::multimap/std::multiset implementations are also exist where duplicate keys are allowed.

For OCaml, we can use __Hashtbl__.

However, here is our take of a simple Separate Chaining implementation: __HashTableDemo.cpp__ | __py__ | __java__.

__flipped classrooms__ in NUS.

**If you are really a CS lecturer (or an IT teacher)** (outside of NUS) and are interested to know the answers, please drop an email to stevenhalim at gmail dot com (**show your University staff profile/relevant proof to Steven**) for Steven to manually activate this CS lecturer-only feature for you.

FAQ: This feature will **NOT** be given to anyone else who is not a CS lecturer.

如果（整数或字符串）键只需要映射到卫星数据，则散列表是实现Table ADT的非常好的数据结构，对于Search（v），Insert（v）和Remove（ v）如果哈希表设置正确，则执行时间复杂度为O(**1**)。

但是，如果我们需要__更多地使用 键__，我们可能需要使用另一种数据结构。

关于这个数据结构的一些更有趣的问题，请在__哈希表__培训模块上练习（不需要登录，但是只需短暂且中等难度设置）。

但是，对于注册用户，您应该登录，然后转到__主要培训页面__以正式清除此模块，这些成果将记录在您的用户帐户中。

尝试解决一些基本的编程问题，就很可能需要使用哈希表（特别是如果输入大小很大的时候）：

__Kattis - cd__（输入已经被排序，因此存在替代的非哈希表解决方案; 如果未对输入进行排序，则在哈希表的帮助下最好解决此组交集问题），__Kattis - oddmanout__（我们可以将较大的邀请代码映射到较小范围的整数;这是整数散列（大范围）的一种做法），__Kattis - whatdoesthefoxsay__（我们把不是狐狸的声音放入无序集合;这是散列字符串的做法）。

You have reached the last slide. Return to 'Exploration Mode' to start exploring!

Note that if you notice any bug in this visualization or if you want to request for a new visualization feature, do not hesitate to drop an email to the project leader: Dr Steven Halim via his email address: stevenhalim at gmail dot com.

__关闭__

创建

搜索

插入

移除