A Binary (Max) Heap is a complete binary tree that maintains the Max Heap property.
Binary Heap is one possible data structure to model an efficient Priority Queue (PQ) Abstract Data Type (ADT). In a PQ, each element has a "priority" and an element with higher priority is served before an element with lower priority (ties are broken with standard First-In First-Out (FIFO) rule as with normal Queue). Try clicking ExtractMax() for a sample animation on extracting the max value of random Binary Heap above.
To focus the discussion scope, this visualization show a Binary Max Heap of integers where duplicates are allowed. See this for an easy conversion to Binary Min Heap.
完整二叉树: 每一层都是二叉树 都被填满 ，除了最低/最下面的一层，并且所有的最底层顶点都尽力向左靠拢。
最大二叉堆特性: 每个顶点的父元素 - 除了根元素 - 都比当前元素的值要大。这种定义比下面这种更容易验证：一个顶点的值 - 除了叶顶点 - 必须必它的一个或者两个子元素要大。
Priority Queue (PQ) Abstract Data Type (ADT) is similar to normal Queue ADT, but with these two major operations:
想象：你是一个在机场控制塔工作的 空中交通管理员 (ATC)
你已经安排了飞机 X/Y 在接下来的 3/6 分钟分别降落。两辆飞机都有足够飞行15分钟的油并且都在距机场2分钟的位置。据你观察机场跑道现在空旷。
如果你不知道的话，飞机可以听令在机场附近飞 等待航线 直到指定的降落时间。
You have to attend the live lecture to figure out what happens next...
There will be two options presented to you and you will have to decide:
If none of the two options is reasonable for you, simply do nothing.
We are able to implement this PQ ADT using (circular) array or Linked List but we will have slow (i.e., in O(N)) Enqueue or Dequeue operation.
Now, let's view the visualisation of a (random) Binary (Max) Heap above. You should see a complete binary tree and all vertices except the root satisfy the Max Heap property (A[parent(i)] > A[i] — which is still fine even with the presence of duplicate integers).
You can Toggle the Visualization Mode between the visually more intuitive complete binary tree form or the underlying compact array based implementation of a Binary (Max) Heap.
Quiz: Based on this Binary (Max) Heap property, where will the largest integer be located?
A complete binary tree can be stored efficiently as a compact array A as there is no gap between vertices of a complete binary tree/elements of a compact array. To simplify the navigation operations below, we use 1-based array. VisuAlgo displays the index of each vertex as a red label below each vertex. Read those indices in sorted order from 1 to N, then you will see the vertices of the complete binary tree from top to down, left to right. To help you understand this, Toggle the Visualization Mode several times.
Pro tip: Try opening two copies of VisuAlgo on two browser windows. Try to visualize the same Binary Max Heap in two different modes and compare them.
This way, we can implement basic binary tree traversal operations with simple index manipulations (with help of bit shift manipulation):
In this visualization, you can perform several Binary (Max) Heap operations:
There are a few other possible Binary (Max) Heap operations, but currently we do not elaborate them for pedagogical reason in a certain NUS module.
Insert(v): 当我们在往最大二叉堆插入元素 v 的时候 我们只能插入到最后一个元素 N + 1 来维持一个紧凑的数组 = 完整二叉树特性。然而 最大堆的特性仍然可能没被满足。这时我们从插入点往上去修复最大堆特性（如果需要的话）直到整个堆符合最大堆特性。现在点 Insert(v) 几次来插入几个随机的 v 进去现在显示的最大二叉堆。
这个向上走并且修复最大堆特性的操作没有公认的名字。我们叫它 ShiftUp （上移）但是还有人叫它 BubbleUp 或者 IncreaseKey 操作.
The time complexity of this Insert(v) operation is O(log N).
Discussion: Do you understand the derivation?
ExtractMax(): 提取并删除最大二叉堆中的最大元素（根）的操作需要一个在堆中的元素来替换根，不然最大堆（一个完整的二叉树，在中文中的'林‘）会变成两个断开的子树（中文中两个'木’）。这同样也是这个元素必须是最后一个元素N的原因：维持紧凑数组 = 完整二叉树性质。
因为我们要把最大二叉堆中的一个叶顶点晋升成根顶点，它很可能是会违背最大堆特性。ExtractMax() 这时从根往下比较当前值和它的两个子元素的值（如果必要的话）来修复最大二叉堆特性。现在在当前显示的最大二叉堆中试试 ExtractMax()
这个向下走并修复最大堆特性的操作没有公认的名字。我们叫它 ShiftDown （下移）但是还有人叫它 BubbleDown 或者 Heapify 操作。.
The time complexity of this ExtractMax() operation is O(log N).
Discussion: Do you understand the derivation?
Up to here, we have a data structure that can implement the two major operations of Priority Queue (PQ) ADT efficiently:
However, we can do a few more operations with Binary Heap.
Create(A): 用含有 N 个逗号分隔的数组 A 从一个空的最大二叉堆开始创建一个合法的最大二叉堆
有两种方法完成创建操作一个简单但是慢 -> O(N log N) 另一个用更复杂的技术但是快 -> O(N).
教你一招：在两个浏览器里面同时打开 VisuAlgo。用最差情况 '已排序例子' 去执行不同版本的 Create(A)，你会看到可怕的差距。
Create(A) - O(N log N): 单纯的一个接一个的插入 (用 Insert(v) 操作的意思) 所有 N 个在输入数组中的元素进入到一个本来是空的最大二叉堆中。
分析: 这个操作显然是 O(N log N) 我们用了 O(log N) Insert(v) 操作 N 次. 让我们来研究 ’排好序的例子‘ 它是这个操作中比较复杂的例子之一（现在试试 Hard Case - O(N log N) 我们显示了这个例子 A=[1,2,3,4,5,6,7] -- 请耐心等待 这个例子会比较耗时). 如果我们用递增的顺序往一个空的最大二叉堆中插入，每一次插入都会触发一次从插入点（新叶元素）到根的上移操作。
Create(A) - O(N): This faster version of Create(A) operation was invented by Robert W. Floyd in 1964. It takes advantage of the fact that a compact array = complete binary tree and all leaves (i.e., half of the vertices — see the next slide) are Binary Max Heap by default. This operation then fixes Binary Max Heap property (if necessary) only from the last internal vertex back to the root.
Analysis: A loose analysis gives another O(N/2 log N) = O(N log N) complexity but it is actually just O(2*N) = O(N) — details in the next few slides. Now try the Hard Case - O(N) on the same input array A=[1,2,3,4,5,6,7] and see that on the same hard case as with the previous slide (but not the one that generates maximum number of swaps), this operation is far superior than the O(N log N) version.
Simple explanation on why half of Binary (Max) Heap of N (without loss of generality, let's assume that N is even) elements are leaves are as follows:
Suppose that the last leaf is at index N, then the parent of that last leaf is at index i = N/2 (remember this slide). The left child of vertex i+1, if exists (it actually does not exist), will be 2*(i+1) = 2*(N/2+1) = N+2, which exceeds index N (the last leaf) so index i+1 must also be a leaf vertex that has no child. As Binary Heap indexing is consecutive, basically indices [i+1 = N/2+1, i+2 = N/2+2, ..., N], or half of the vertices, are leaves.
Create(A) 用 O(N) 的版本是这样：
如果以上公式太复杂的话，现代学生会想到用 WolframAlpha 来解决。
堆排序函数HeapSort()： John William Joseph Williams 在1964年发明了堆排序算法和这个二叉堆数据结构. 堆排序的操作 （假定最大二叉堆已经在O(n)的时间复杂度内建立）非常简单。只需要调用提取堆顶函数ExtractMax() n次即可，每次的时间复杂度为O(log N). 现在你可以试一下屏幕上的这个最大二叉堆 HeapSort()
Quiz: In worst case scenario, HeapSort() is asymptotically faster than...Selection Sort
Although HeapSort() runs in θ(N log N) time for all (best/average/worst) cases, is it really the best comparison-based sorting algorithm?
Discussion: How about caching performance of HeapSort()?
We can actually just call the O(log N) ExtractMax() operation K times if we are only interested in the top K largest elements in the Binary (Max) Heap. Now try [tobeadded] on the currently displayed Binary (Max) Heap. This operation is called PartialSort().
Simple Analysis: PartialSort() clearly runs in O(K log N) — an output-sensitive algorithm where the time complexity depends on the output size K.
You have reached the end of the basic stuffs of this Binary (Max) Heap data structure and we encourage you to explore further in the Exploration Mode.
However, we still have a few more interesting Binary (Max) Heap challenges for you that are outlined in this section.
When you have cleared them all, we invite you to study more advanced algorithms that use Priority Queue as (one of) its underlying data structure, like Prim's MST algorithm, Dijkstra's SSSP algorithm, A* search algorithm (not in VisuAlgo yet), a few other greedy-based algorithms, etc.
If we only deal with numbers (including this visualization that is restricted to integers only), it is easy to convert a Binary Max Heap into a Binary Min Heap without changing anything, or vice versa.
We can re-create a Binary Heap with the negation of every integer in the original Binary Heap. If we start with a Binary Max Heap, the resulting Binary Heap is a Binary Min Heap (if we ignore the negative symbols — see the picture above), and vice versa.
For some Priority Queue applications (e.g., HeapDecreaseKey in Dijkstra's algorithm), we may have to modify (increase or decrease) the priority of an existing value that is already inserted into a Binary (Max) Heap. If the index i of the value is known, we can do the following simple strategy: Simply update A[i] = newv and then call both shiftUp(i) and shiftDown(i). Only at most one of this Max Heap property restoration operation will be successfully, i.e., shiftUp(i)/shiftDown(i) will be triggered if newv >/< old value of A[parent(i)]/A[larger of the two children of i], respectively.
Thus, UpdateKey(i, newv) can be done in O(log N), provided we know index i.
For some Priority Queue applications, we may have to delete an existing value that is already inserted into a Binary (Max) Heap (and this value happens to be not the root). Again, if the index i of the value is known, we can do the following simple strategy: Simply update A[i] = A+1 (a larger number greater than the current root), call shiftUp(i) (technically, UpdateKey(i, A+1)). This will floats index i to be the new root, and from there, we can easily call ExtractMax() once to remove it.
Thus, Delete(i) can be done in O(log N), provided we know index i.
Discussion: Now for UpdateKey(i, newv) and Delete(i), what if we are given oldv instead and thus we have to search for its location in the Binary (Max) Heap? Can we do this faster than O(N)?
If you are looking for an implementation of Binary (Max) Heap to actually model a Priority Queue, then there is a good news.
C++ and Java already have built-in Priority Queue implementations that very likely use this data structure. They are C++ STL priority_queue (the default is a Max Priority Queue) and Java PriorityQueue (the default is a Min Priority Queue). However, the built-in implementation may not be suitable to do some PQ extended operations efficiently (details omitted for pedagogical reason in a certain NUS module).
Python heapq exists but its performance is rather slow. OCaml doesn't have built-in Priority Queue but we can use something else that is going to be mentioned in the other modules in VisuAlgo (the reason on why the details are omitted is the same as above).
PS: Heap Sort is likely used in C++ STL algorithm partial_sort.
Nevertheless, here is our implementation of BinaryHeapDemo.cpp.
For a few more interesting questions about this data structure, please practice on Binary Heap training module (no login is required).
However, for NUS students, you should login using your official class account, officially clear this module, and such achievement will be recorded in your user account.
We also have a few programming problems that somewhat requires the usage of this Binary Heap data structure: UVa 01203 - Argus and Kattis - numbertree.
Try them to consolidate and improve your understanding about this data structure. You are allowed to use C++ STL priority_queue, Python heapq, or Java PriorityQueue if that simplifies your implementation.
After spending one long lecture on Binary (Max) Heap, here is a jaw-dropping moment...
Binary (Max) Heap data structure is probably not the best data structure to implement (certain operations of) ADT Priority Queue...
Discussion: So what is the alternative data structure?