二叉搜索树

1. Introduction

A Binary Search Tree (BST) is a binary tree in which each vertex has only up to 2 children that satisfies BST property: All vertices in the left subtree of a vertex must hold a value smaller than its own and all vertices in the right subtree of a vertex must hold a value larger than its own (we have assumption that all values are distinct integers in this visualization and small tweak is needed to cater for duplicates/non integer). Try clicking Search(7) for a sample animation on searching a random value ∈ [1..99] in the random BST above.

An Adelson-Velskii Landis (AVL) tree is a self-balancing BST that maintains it's height to be O(log N) when having N vertices in the AVL tree.

2. BST和平衡BST（AVL树）

To toggle between the standard Binary Search Tree and the AVL Tree (only different behavior during Insertion and Removal of an Integer), select the respective header.

We also have URL shortcut to quickly access the AVL Tree mode, which is https://visualgo.net/en/avl (you can change the 'en' to your two characters preferred language - if available).

3. 动机

BST (and especially balanced BST like AVL Tree) is an efficient data structure to implement a certain kind of Table (or Map) Abstract Data Type (ADT).

A Table ADT must support at least the following three operations as efficient as possible:

1. Search(v) — determine if v exists in the ADT or not,
2. Insert(v) — insert v into the ADT,
3. Remove(v) — remove v from the ADT.

Reference: See similar slide in Hash Table e-Lecture.

3-2. 使用Sorted Array / Vector

2. 插入（v）可以在O(1)中实现，只需将v放在数组的后面，
3. 删除（v）也在O(N)中运行，因为我们必须首先搜索已经O(N)的v，然后在删除后关闭产生的间隙 - 也在O(N)中。

3-3. 使用Sorted Array / Vector

1. 搜索（v）现在可以在O(log N),中实现，因为我们现在可以在已排序的数组上使用二进制搜索，
2. 插入（v）现在在O(N)中运行，因为我们需要实现类似插入的策略以使数组保持排序状态，
3. 删除（v）在O(N)中运行，因为即使 Search(v) O(log N)中运行，我们仍然需要在删除后关闭间隙 - 在O（N）中。

3-4. O（log N）复杂性？

 N ≈ 1 000 ≈ 1 000 000 ≈ 1 000 000 000 log N 10 Only 20 :O Only 30 :O:O

On top of the basic three, there are a few other possible Table ADT operations:

1. Find the Min()/Max() element,
2. Find the Successor(v) — 'next larger'/Predecessor(v) — 'previous smaller' element,
3. List elements in sorted order,
4. Operation X & Y - hidden for pedagogical purpose in an NUS module,
5. There are others possible operations.

Discussion: What are the best possible implementation for the first three additional operations if we are limited to use [sorted|unsorted] array/vector?

3-6. 答案

[This is a hidden slide]

3-7. 链接表怎么样？

Quiz: Can we perform all basic three Table ADT operations: Search(v)/Insert(v)/Remove(v) efficiently (read: faster than O(N)) using Linked List?

No
Yes

3-8. 答案

[This is a hidden slide]

3-9. 哈希表怎么样？

Quiz: So what is the point of learning this BST module if Hash Table can do the crucial Table ADT operations in unlikely-to-be-beaten expected O(1) time?

There is no point, so this BST module can be ignored
There are valid reasons, which are ____

3-10. The Solution

[This is a hidden slide]

4. 可视化

We will now introduce BST data structure. See the visualization of an example BST above!

Root vertex does not have a parent. There can only be one root vertex in a BST. Leaf vertex does not have any child. There can be more than one leaf vertex in a BST. Vertices that are not leaf are called the internal vertices. Sometimes root vertex is not included as part of the definition of internal vertex as the root of a BST with only one vertex can actually fit into the definition of a leaf too.

In the example above, vertex 15 is the root vertex, vertex {5, 7, 50} are the leaves, vertex {4, 6, 15 (also the root), 23, 71} are the internal vertices.

5. BST的操作

1. .查询操作（BST结构保持不变）：
1. 搜索（v）
2. 前身（v）（和类似的继承人（v）），和
3. Inorder Traversal（我们将很快添加Preorder和Postorder Traversal），
2. 更新操作（BST结构可能会改变）：
1.插入（v）
2.删除（v），
3.创建BST（几个标准）。

5-1. 一些其他的 BST操作

There are a few other BST (Query) operations that have not been visualized in VisuAlgo:

1. Rank(v): Given a key v, determine what is its rank (1-based index) in the sorted order of the BST elements. That is, Rank(FindMin()) = 1 and Rank(FindMax()) = N. If v does not exist, we can report -1.
2. Select(k): Given a rank k, 1 ≤ kN, determine the key v that has that rank k in the BST. Or in another word, find the k-th smallest element in the BST. That is, Select(1) = FindMin() and Select(N) = FindMax().

The details of these two operations are currently hidden for pedagogical purpose in a certain NUS module.

6. 搜索(v)

Because of the way data (distinct integers for this visualization) is organised inside a BST, we can binary search for an integer v efficiently (hence the name of Binary Search Tree).

First, we set the current vertex = root and then check if the current vertex is smaller/equal/larger than integer v that we are searching for. We then go to the right subtree/stop/go the left subtree, respectively. We keep doing this until we either find the required vertex or we don't.

On the example BST above, try clicking Search(23) (found after 2 comparisons), Search(7) (found after 3 comparisons), Search(21) (not found after 3 comparisons).

6-2. O（h）时间复杂性

Search（v）/ FindMin（）/ FindMax（）操作在O（h）中运行，其中h是BST的高度。

7. 后继(v)

1. 如果 v 具有右子树，则 v 的右子树中的最小整数必须是 v 的后继。尝试Successor(23)（应为50）。
2. 如果 v 没有右子树，我们需要遍历 v 的祖先，直到我们找到'右转'到顶点 w（或者，直到我们发现第一个顶点 w 大于顶点 v）。 一旦我们找到顶点 w，我们将看到顶点 v w 的左子树中的最大元素。 试试Successor(7)（应该是15）。
3. 如果 v 是BST中的最大整数，则 v 没有后继。 试试Successor(71)（应该没有）。

8. 中序遍历

We can perform an Inorder Traversal of this BST to obtain a list of sorted integers inside this BST (in fact, if we 'flatten' the BST into one line, we will see that the vertices are ordered from smallest/leftmost to largest/rightmost).

Inorder Traversal is a recursive method whereby we visit the left subtree first, exhausts all items in the left subtree, visit the current root, before exploring the right subtree and all items in the right subtree. Without further ado, let's try Inorder Traversal to see it in action on the example BST above.

8-1. O(N)的时间复杂度

PS：有些人调用N个无序整数插入O（N log N）中的BST，然后执行O（N）Inorder Traversal作为'BST sort'。 它很少使用，因为有几种比这更容易使用（基于比较）的排序算法

8-2. 答案

[This is a hidden slide]

9. 插入(v)

We can insert a new integer into BST by doing similar operation as Search(v). But this time, instead of reporting that the new integer is not found, we create a new vertex in the insertion point and put the new integer there. Try Insert(60) on the example above.

9-2. 在线测验

Quiz: Inserting integers [1,10,2,9,3,8,4,7,5,6] one by one in that order into an initially empty BST will result in a BST of height:

8
The height cannot be determined
9
10

10. 删除（v） - 三个可能的案例

10-1. 删除（v） - 案例1

The first case is the easiest: Vertex v is currently one of the leaf vertex of the BST.

Deletion of a leaf vertex is very easy: We just remove that leaf vertex — try Remove(5) on the example BST above (second click onwards after the first removal will do nothing — please refresh this page or go to another slide and return to this slide instead).

This part is clearly O(1) — on top of the earlier O(h) search-like effort.

10-2. 删除（v） - 案例2

The second case is also not that hard: Vertex v is an (internal/root) vertex of the BST and it has exactly one child. Removing v without doing anything else will disconnect the BST.

Deletion of a vertex with one child is not that hard: We connect that vertex's only child with that vertex's parent — try Remove(23) on the example BST above (second click onwards after the first removal will do nothing — please refresh this page or go to another slide and return to this slide instead).

This part is also clearly O(1) — on top of the earlier O(h) search-like effort.

10-3. 删除（v） - 案例3

The third case is the most complex among the three: Vertex v is an (internal/root) vertex of the BST and it has exactly two children. Removing v without doing anything else will disconnect the BST.

Deletion of a vertex with two children is as follow: We replace that vertex with its successor, and then delete its duplicated successor in its right subtree — try Remove(6) on the example BST above (second click onwards after the first removal will do nothing — please refresh this page or go to another slide and return to this slide instead).

This part requires O(h) due to the need to find the successor vertex — on top of the earlier O(h) search-like effort.

10-4. 删除（v） - 案例3讨论

1. 为什么用后继C替换有两个子节点的顶点B总是一个有效的策略？
2. 我们可以用它的前身A替换有两个子节点的顶点B吗？ 为什么或者为什么不？

10-5. 答案

[This is a hidden slide]

10-6. O（h）时间复杂性

Remove(v) runs in O(h) where h is the height of the BST. Removal case 3 (deletion of a vertex with two children is the 'heaviest' but it is not more than O(h)).

As you should have fully understand by now, h can be as tall as O(N) in a normal BST as shown in the random 'skewed right' example above. If we call Remove(FindMax()), i.e. we remove the current max integer, we will go from root down to the last leaf in O(N) time before removing it — not efficient.

11. 创建BST

1. BST（然后你可以逐个插入几个整数），
2. 你可能已经看过几次的两个电子讲座例子
3. 随机BST（不太可能非常高），
4. 向左/向右倾斜BST（具有N个顶点的高BST和具有N-1个链接列表的边缘，以展示BST操作的最坏情况行为;在AVL树模式中禁用）。

12. 间奏曲

PS：如果你想研究如何在真实程序中实现这些基本的BST操作，你可以下载这个BSTDemo.cpp

13. 平衡 BST

13-3. 高度的正式定义（v）

v.height = -1 (if v is an empty tree)
v.height = max(v.left.height, v.right.height) + 1 (otherwise)

13-4. 小测试

Quiz: What are the values of height(20), height(65), and height(41) on the BST above?

height(41) = 4
height(65) = 2
height(65) = 3
height(20) = 3
height(20) = 2
height(41) = 3

13-5. BST高度的下限

If we have N elements/items/keys in our BST, the lower bound height h > log2 N if we can somehow insert the N elements in perfect order so that the BST is perfectly balanced.

See the example shown above for N = 15 (a perfect BST which is rarely achievable in real life — try inserting any other integer and it will not be perfect anymore).

13-6. 下限的推导

N ≤ 1 + 2 + 4 + ... + 2h
N ≤ 20 + 21 + 22 + … + 2h
N < 2h+1 (几何级数之和)
log2 N < log2 2h+1
log2 N < (h+1) * log2 2 (log2 2 is 1)
h > (log2 N)-1 (代数操作)
h > log2 N

13-7. BST高度的上限

If we have N elements/items/keys in our BST, the upper bound height h < N if we insert the elements in ascending order (to get skewed right BST as shown above).

The height of such BST is h = N-1, so we have h < N.

Discussion: Do you know how to get skewed left BST instead?

13-8. 答案

[This is a hidden slide]

13-9. 联合约束

We have seen that most BST operations are in O(h) and combining the lower and upper bounds of h, we have log2 N < h < N.

There is a dramatic difference between log2 N and N and we have seen from the discussion of the lower bound that getting perfect BST (at all times) is near impossible...

So can we have BST that has height closer to log2 N, i.e. c * log2 N, for a small constant factor c? If we can, then BST operations that run in O(h) actually run in O(log N)...

14. AVL 树

Introducing AVL Tree, invented by two Russian (Soviet) inventors: Georgy Adelson-Velskii and Evgenii Landis, back in 1962.

In AVL Tree, we will later see that its height h < 2 * log N (tighter analysis exist, but we will use easier analysis in VisuAlgo where c = 2). Therefore, most AVL Tree operations run in O(log N) time — efficient.

Insert(v) and Remove(v) update operations may change the height h of the AVL Tree, but we will see rotation operation(s) to maintain the AVL Tree height to be low.

14-1. 步骤1：有效维持高度（v）

To have efficient performance, we shall not maintain height(v) attribute via the O(N) recursive method every time there is an update (Insert(v)/Remove(v)) operation.

Instead, we compute O(1): x.height = max(x.left.height, x.right.height) + 1 at the back of our Insert(v)/Remove(v) operation as only the height of vertices along the insertion/removal path may be affected. Thus, only O(h) vertices may change its height(v) attribute and in AVL Tree, h < 2 * log N.

Try Insert(37) on the example AVL Tree (ignore the resulting rotation for now, we will come back to it in the next few slides). Notice that only a few vertices along the insertion path: {41,20,29,32} increases their height by +1 and all other vertices will have their heights unchanged.

14-3. Proof - 1

Adelson-Velskii and Landis claim that an AVL Tree (a height-balanced BST that satisfies AVL Tree invariant) with N vertices has height h < 2 * log2 N.

The proof relies on the concept of minimum-size AVL Tree of a certain height h.

Let Nh be the minimum number of vertices in a height-balanced AVL Tree of height h.

The first few values of Nh are N0 = 1 (a single root vertex), N1 = 2 (a root vertex with either one left child or one right child only), N2 = 4, N3 = 7, N4 = 12, N5 = 20 (see the background picture), and so on (see the next two slides).

14-4. 证明 - 2 14-5. 证明 - 3

Nh = 1 + Nh-1 + Nh-2 (高度为h的最小尺寸AVL树的公式)
Nh > 1 + 2*Nh-2 (as Nh-1 > Nh-2)
Nh > 2*Nh-2 (obviously)
Nh > 4*Nh-4 (递归)
Nh > 8*Nh-6 (另一个递归步骤)
... (假设初始h是偶数，我们只能这样做h/2次)
Nh > 2h/2*N0 (我们到了base case)
Nh > 2h/2 (as N0 = 1) 14-6. 证明 - 4

N ≥ Nh > 2h/2 (结合前两张幻灯片)
N > 2h/2
log2(N) > log2(2h/2) (两张幻灯片都是log2 )
log2(N) > h/2 (等式化简)
2 * log2(N) > h or h < 2 * log2(N)
h = O(log(N)) (最后结论) 14-7. 第3步：保持不变性

Look at the example BST again. See that all vertices are height-balanced, an AVL Tree.

To quickly detect if a vertex v is height balanced or not, we modify the AVL Tree invariant (that has absolute function inside) into: bf(v) = v.left.height - v.right.height.

Now try Insert(37) on the example AVL Tree again. A few vertices along the insertion path: {41,20,29,32} increases their height by +1. Vertices {29,20} will no longer be height-balanced after this insertion (and will be rotated later — discussed in the next few slides), i.e. bf(29) = -2 and bf(20) = -2 too. We need to restore the balance.

14-8. 介绍旋转树 Tree Rotation保留BST属性。 在旋转之前，P≤B≤Q。旋转之后，请注意以B为根的子树（如果存在）更改父级，但P≤B≤Q不会更改。

14-9. 非平凡的O（1）树旋转的伪码

BSTVertex rotateLeft(BSTVertex T) // pre-req: T.right != null
BSTVertex w = T.right // rotateRight is the mirror copy of this
w.parent = T.parent // this method is hard to get right for newbie
T.parent = w
T.right = w.left
if (w.left != null) w.left.parent = T
w.left = T
// update the height of T and then w here
return w

14-10. 四个重新平衡案例 14-11. 在AVL树中插入（v）

1. Just insert v as in normal BST,
2. Walk up the AVL Tree from the insertion point back to the root and at every step, we update the height and balance factor of the affected vertices:
1. Stop at the first vertex that is out-of-balance (+2 or -2), if any,
2. Use one of the four tree rotation cases to rebalance it again, e.g. try Insert(37) on the example above and notice by calling rotateLeft(29) once, we fix the imbalance issue.

Discussion: Is there other tree rotation cases for Insert(v) operation of AVL Tree?

14-12. 答案

[This is a hidden slide]

14-13. 在AVL树中删除（v）

1. 只需删除正常BST中的v（三个删除案例中的一个），
2. 将AVL树从删除点向上移回根，每一步，我们都会更新受影响顶点的高度和平衡因子：
1. 现在，对于每个不平衡的顶点（+2或-2），我们使用四个树旋转情况中的一个来重新平衡它们（可能多于一个）。

14-14. AVL树摘要

[This is a hidden slide]

15. 附加功能

We will end this module with a few more interesting things about BST and balanced BST (especially AVL Tree).

15-1. 那2个额外的BST操作

[This is a hidden slide]

15-2. Side Usage of Balanced BST?

[This is a hidden slide]

15-4. 在线评测练习

We also have a few programming problems that somewhat requires the usage of this balanced BST (like AVL Tree) data structure: Kattis - compoundwords and Kattis - baconeggsandspam.

Try them to consolidate and improve your understanding about this data structure. You are allowed to use C++ STL map/set, Java TreeMap/TreeSet, or OCaml Map/Set if that simplifies your implementation (Note that Python doesn't have built-in bBST implementation).

15-5. 答案

[This is a hidden slide]