5.6. Heapsort
File:
Heaps.ml
(continued)
Let us now exploit the ability of a max-heap to always keep the element with the largest key at the beginning, as well as being able to restore a heap from an “almost-heap” (i.e., the one that only has one offending triple) in \(\Theta(n \log n)\), for construct a very efficient sorting algorithm — Heapsort.
Heapsort starts by turning an arbitrary array into a max-heap (by
means of build_max_heap
). It then repeatedly takes the first
element and swaps it with the “working” element in the tail, building
a sorted array backwards. After each swap, it shifts the “front” of
what is considered to be a heap (i.e., the mentioned above
heap_size
), and what is an already sorted array suffix, and
restores the heap structure up to this front. The following code the
final addition to the Heaps
functor:
let heapsort arr =
let len = Array.length arr in
let heap_size = ref len in
build_max_heap arr;
for i = len - 1 downto 1 do
swap arr 0 i;
heap_size := !heap_size - 1;
max_heapify !heap_size arr 0;
done
5.6.1. Heapsort Complexity
The main bulk of complexity is taken by build_max_heap arr
(which
results in \(O(n \log n)\)) and in running the loop. Since the
loop runs \(n/2\) iteration, and each reconstruction of the tree
takes \(O(\log n)\) swaps, the overall complexity of Heapsort is
\(O(n \log n)\).
5.6.2. Evaluating Heapsort
We can now use our checked to make sure that it indeed delivers sorted arrays:
module Checker = SortChecker(KV)
let c = generate_key_value_array 1000
let d = Array.copy c
The following are the results of the experiment:
# heapsort d;;
- : unit = ()
# Checker.sorted_spec c d;;
- : bool = true
5.6.3. Which sorting algorithm to choose?
By now we have seen three linearithmic sorting algorithms: merge sort, Quicksort and Heapsort. The first two achieve efficiency via divide-and-conquer strategy (structuring the computations in a tree). The last one exploits the properties of a maintained data structures (i.e., a heap), which also coincidentally turns out to be a tree.
It would be interesting to compare the relative performance of the three implementations we have, by running them on three copies of the same array:
let x = generate_key_value_array 100000
let y = Array.copy x
let z = Array.copy x
let quicksort = kv_sort_asc
Let us now time the executions:
# time heapsort x;;
Execution elapsed time: 0.511102 sec
- : unit = ()
# time quicksort y;;
Execution elapsed time: 0.145787 sec
- : unit = ()
# time merge_sort z;;
Execution elapsed time: 0.148201 sec
- : unit = ()
We can repeat an experiment on a larger array (e.g., \(10^6\) elements):
# time heapsort x;;
Execution elapsed time: 6.943117 sec
- : unit = ()
# time quicksort y;;
Execution elapsed time: 2.049979 sec
- : unit = ()
# time merge_sort z;;
Execution elapsed time: 2.192766 sec
- : unit = ()
As we can see, the relative performance of the three algorithms
remains the same, with our implementation of heapsort
being about
3.5 slower than both quicksort
and merge_sort
.
The reason why quicksort
beats heapsort
by a constant factor
is because is almost doesn’t perform “unnecessary” element swapping,
which is time consuming. In contrast, even if all of the array is
already ordered, heapsort
is going to swap all of them in order to
make a heap structure.
However, on an almost-sorted array, heapsort
will perform
significantly better than quicksort
and, unlike merge_sort
, it
will not require extra memory:
# let x = generate_key_value_array 10000;;
...
# time quicksort x;;
Execution elapsed time: 0.014825 sec
- : unit = ()
# time quicksort x;;
Execution elapsed time: 3.650797 sec
- : unit = ()
# time heapsort x;;
Execution elapsed time: 0.044624 sec
- : unit = ()