【Data Structure】堆(Heap)/ 二叉堆(binary heap)

Posted by 西维蜀黍 on 2019-07-29, Last Modified on 2024-01-23

Background

这里来说明一下满二叉树(full binary tree)的概念与完全二叉树(complete binary tree)的概念。

满二叉树(Full Binary Tree)

A full binary tree is a binary tree in which all of the nodes have either 0 or 2 offspring. In other terms, a full binary tree is a binary tree in which all nodes, except the leaf nodes, have two offspring.

可以看出:满二叉树所有的节点都拥有左孩子,又拥有右孩子。

完全二叉树(Complete Binary Tree)

A binary tree is said to be a complete binary tree if all its levels, except possibly the last level, have the maximum number of possible nodes, and all the nodes in the last level appear as far left as possible.

堆(Heap)/ 二叉堆(binary heap)

而堆又是实现优先队列(Priority Queue)较为高效的方式(这意味着优先队列还有其他实现方式),因此,在不严谨的情况下,我们有时也将堆和优先队列相互等同。

基于完全二叉树(complete binary tree),是实现堆的一个经典方式,称为二叉堆(binary heap),而又由于其它几种堆(二项式堆,斐波纳契堆等)用的较少,因此一般说堆,就是指二叉堆(binary heap)

二叉堆(binary heap)的特性

堆的特性:

  • 必须是完全二叉树
  • 任一结点的值是其子树中所有结点的最大值或最小值
    • 当为最大值时,称为“最大堆”,也称大顶堆
    • 当为最小值时,称为“最小堆”,也称小顶堆

最大堆(Max Heap)

  • 最大堆中的最大元素值出现在根结点(堆顶)
  • 堆中每个父节点的元素值都大于等于其孩子结点(如果存在)

最小堆(Min Heap)

  • 最小堆中的最小元素值出现在根结点(堆顶)
  • 堆中每个父节点的元素值都小于等于其孩子结点(如果存在)

三个重要的性质

  • 数组里的第一个元素array[0]拥有最高的优先级别。
  • 给定一个下标 i,那么对于元素 array[i] 而言:
    • 它的父节点所对应的元素下标是 (i-1)/2 or (i-2)/2 -> 即 (i-1) // 2
    • 它的左孩子所对应的元素下标是 2*i + 1
    • 它的右孩子所对应的元素下标是 2*i + 2
  • 数组里每个元素的优先级别都要高于它两个孩子的优先级别。

二叉堆的操作

堆化(heapifying)

There are two primary ways to do this: top-down and bottom-up. The bottom-up approach is more efficient and commonly used.

Bottom-Up Approach (More Efficient)

  1. Start from the Middle: The process starts from the middle of the array and works backward. The reason for starting from the middle is that elements in the second half of the array are leaves in the heap and already follow the heap property.
  2. Heapify Each Node: Apply the heapify operation to each non-leaf node. You start from the first non-leaf node and move backward towards the root.
  3. Heapify Operation: For each node, you ensure that it satisfies the heap property (i.e., it is larger than its children in a max heap, or smaller in a min heap). If it doesn’t, swap it with its largest (or smallest for min heap) child and continue heapifying down the tree.
  4. Complete When Root is Reached: Once you have heapified the root, the entire array represents a heap.

Implementation

def heapify(arr, n, i):
	"""
	Function to heapify a subtree rooted with node i which is an index in arr[]. n is size of heap
	"""
	largest = i  # Initialize largest as root
	left = 2 * i + 1     # left = 2*i + 1
	right = 2 * i + 2    # right = 2*i + 2

	# See if left child of root exists and is greater than root
	if left < n and arr[left] > arr[largest]:
		largest = left

	# See if right child of root exists and is greater than the root
	if right < n and arr[right] > arr[largest]:
		largest = right

	# Change root, if needed
	if largest != i:
		arr[i], arr[largest] = arr[largest], arr[i]  # swap

		# Heapify the root
		heapify(arr, n, largest)

# Example usage
arr = [12, 11, 13, 5, 6, 7]
n = len(arr)

# Build a maxheap
for i in range(n//2 - 1, -1, -1):
	heapify(arr, n, i)

Time Complexity

  • Bottom-up: O(n)
  • heapify by adding one element at a time, which would have a complexity of O(nlogn)

The time complexity of building a heap from an array using the bottom-up approach is O(n), which is more efficient than calling heapify n times (which would be O(nlogn)). The O(n) complexity is a bit counterintuitive, but it comes from the fact that the number of operations needed for each level of the tree decreases exponentially as you move up the tree.

Sift Up(上浮)- 最小堆中的插入(push)操作

当有新的数据加入到优先队列中,新的数据首先被放置在二叉堆的底部,然后不断地对它进行向上筛选的操作,即如果发现它的优先级别比父节点的优先级别还要高,那么就和父节点的元素相互交换,再接着往上进行比较,直到无法再继续交换为止。由于二叉堆是一棵完全二叉树,并假设堆的大小为n,因此整个过程其实就是沿着树的高度往上爬,所以只需要O(logn)的时间。

假设现有元素 4 需要插入,为了维持完全二叉树的特性,新插入的元素一定是放在整棵树的最右下方(在本例子新插入元素位于结点 9 的右子树中);同时为了满足任一结点的值要小于左右子树的值这一特性,新插入的元素要和其父结点作比较,如果比父结点小,则进行交换操作(新插入元素顶替父节点的位置,自然地,父节点则需要下移),不断向上寻找,找到比自己大的父结点就进行交换操作,直到没有符合条件的值为止。

Implementation

def sift_up(arr, index):
	"""
	Function to sift up a node in a min heap.
	The function assumes that the rest of the heap is valid.
	"""
	parent_index = (index - 1) // 2

	# If we are not at the root node and the current node is less than its parent
	while index > 0 and arr[index] < arr[parent_index]:
		# Swap the current node with its parent
		arr[index], arr[parent_index] = arr[parent_index], arr[index]

		# Move up to the parent node
		index = parent_index
		parent_index = (index - 1) // 2

# Example usage
min_heap = [5, 9, 11, 14, 18, 19, 21, 33, 17, 27]
new_element = 4
min_heap.append(new_element)  # Adding new element to the end
sift_up(min_heap, len(min_heap) - 1)  # Sift up the new element

min_heap

Sift Down(下沉) - 最小堆中的删除(pop)操作

核心点:将最后一个元素填充到堆顶,然后不断的下沉这个元素。

当堆顶的元素被取出时,我们要更新堆顶的元素来作为下一次按照优先级顺序被取出的对象,我们所需要的是将堆底部的元素放置到堆顶,然后不断地对它执行向下筛选的操作,在这个过程中,该元素和它的两个孩子节点对比,看看哪个优先级最高,如果优先级最高的是其中一个孩子,就将该元素和那个孩子进行交换,然后反复进行下去,直到无法继续交换为止,整个过程就是沿着树的高度往下爬,所以时间复杂度也是O(logn)。

Implementation

def sift_down(arr, n, i):
	"""
	Function to sift down a node in a min heap.
	"""
	smallest = i  # Initialize smallest as root
	left = 2 * i + 1  # left = 2*i + 1
	right = 2 * i + 2  # right = 2*i + 2

	# If left child exists and is smaller than the root
	if left < n and arr[left] < arr[smallest]:
		smallest = left

	# If right child exists and is smaller than smallest so far
	if right < n and arr[right] < arr[smallest]:
		smallest = right

	# If smallest is not root
	if smallest != i:
		arr[i], arr[smallest] = arr[smallest], arr[i]  # Swap

		# Recursively heapify the affected subtree
		sift_down(arr, n, smallest)

# Example usage
min_heap = [9, 14, 11, 33, 18, 19, 21, 5, 17, 27]
n = len(min_heap)
sift_down(min_heap, n, 0)  # Sift down the first element

min_heap

Peek (Find max/min)

Peek operation returns the maximum element from Max Heap or minimum element from Min Heap without deleting the node.

  • time: O(1)

时间复杂度分析

由于二叉堆有 $log_2n$ 层深,n为元素的数量,因此二叉堆节点入队(上浮)和出队(下沉)操作的时间复杂度都是$O(log_2n)$,所以二叉堆的入队和出队操作的平均时间和最差时间都是 $O(log_2n)$。

  • Push: O(logn)
  • Search: O(n)
    • binary heap is not designed for seaching
  • findMin/findMax: O(1)
  • Pop/deleteMin/deleteMax: O(logn)

Reference