5.3. Sorting in Linear Time
File:
LinearTimeSorting.ml
As we have just determined, one cannot do comparison-based sorting better than in \(O(n \log n)\) in the worst case. However, we can improve this complexity if we base the logic of our algorithm not just on comparisons, but will also exploit the intrinsic properties of the data used as keys for elements to be sorted (e.g., integers). In this chapter we will see some examples of such specialised sorting procedures.
5.3.1. Simple Bucket Sort
Bucket sort works well for the case, when the size of the set, from which we draw the keys is limited by a certain number bnum. In this case, we can allocate an auxiliary array of “buckets” (implemented as lists), which will serve to collect elements with the key corresponding to the bucket number. The code is as follows:
let simple_bucket_sort bnum arr =
let buckets = Array.make bnum [] in
let len = Array.length arr in
for i = 0 to len - 1 do
let key = fst arr.(i) in
let bindex = key mod bnum in
let b = buckets.(bindex) in
buckets.(bindex) <- arr.(i) :: b
done;
let res = ref [] in
for i = bnum - 1 downto 0 do
res := List.append (List.rev (buckets.(i))) !res
done;
list_to_array !res
Having created an array buckets
, the sort than traverses the
initial array arr
, and puts each element with a key key
into
the bucket with the corresponding index, obtained as bindex = key
mod bnum
. Notice that if the all keys are in range limited by
bnum
, the mod
operation returns the key itself.
Therefore, the first for
-loop has a complexity \(\Theta(n)\), where
\(n\) is the size of arr
. The second loop walks through the array of
buckets all the buckets (making bnum
iterations) and concatenates all the
lists, returning the result as the array. It is straightforward to show that the
resulting complexity of the algorithm is in \(O(\mathtt{bnum} \cdot n)\) (it
can be made \(O(\mathtt{bnum} + n)\) if we use append-only buffers, so we
don’t have to re-traverse the lists), i.e., it is linear in n
.
We can see simple_bucket_sort
in action:
# let c =[|9; 9; 0; 9; 4; 7; 9; 2; 3; 3|];;
# simple_bucket_sort 10 c;;
- : int array = [|0; 2; 3; 3; 4; 7; 9; 9; 9; 9|]
5.3.2. Enhanced Bucket Sort
If the size of the space of keys exceeds the number of the buckets, one can still use the same idea, while also sorting each bucket individually with a suitable sorting, such as insertion sort (implemented for lists), as it will be operating on small and almost sorted sub-arrays:
let bucket_sort max ?(bnum = 1000) arr =
let buckets = Array.make bnum [] in
let len = Array.length arr in
for i = 0 to len - 1 do
let key = arr.(i) in
let bind = key * bnum / max in
let b = buckets.(bind) in
buckets.(bind) <- arr.(i) :: b
done;
let res = ref [] in
for i = bnum - 1 downto 0 do
let bucket_contents = List.rev (buckets.(i)) in
let sorted_bucket = InsertSort.insert_sort bucket_contents in
res := List.append sorted_bucket !res
done;
list_to_array !res
The code of bucket_sort
above takes an optional parameter bnum
for the number of buckets (default is 10, if omitted) and a parameter
max
to indicate the maximal possible key (should be guessed by the
client of the sorting). When allocating elements to the corresponding
buckets, it divides the entire space of keys (up to the maximal one)
into bnum
portions, and puts the corresponding element into the
appropriate bucket. Since elements with different keys (from the same
segment) may end up in the same bucket, and additional sorting is
required. Let us test this implementation:
# let e = generate_int_array 10000;;
val e : int array = [|4505; 6905; 5076; 9250; 5101; 2539; 1721; ... |]
# bucket_sort 10000 e;;
- : int array = [|0; 1; 3; 3; 5; 5; 5; 6; 6; 9; 10; ... |]
5.3.3. Stability of sorting
An important property of a sorting algorithm is stability. A sorting algorithms is stable if it preserves the ordering between the elements with equal keys in the initial array.
An example of a stable sorting algorithm is kv_bucket_sort
shown
below, which sorts an array of key-value pairs based on the keys:
let kv_bucket_sort bnum arr =
let buckets = Array.make bnum [] in
let len = Array.length arr in
for i = 0 to len - 1 do
let key = fst arr.(i) in
let bindex = key mod bnum in
let b = buckets.(bindex) in
buckets.(bindex) <- arr.(i) :: b
done;
let res = ref [] in
for i = bnum - 1 downto 0 do
res := List.rev_append buckets.(i) !res
done;
list_to_array !res
As an example, consider its following execution:
# let f = [|(3, "zqped"); (8, "esmup"); (7, "tvqej"); (8, "xhlzj"); (4, "blann");
(9, "ouors"); (0, "iocvx"); (3, "dacht"); (7, "rncpn");
(7, "khott")|];;
# kv_bucket_sort 10 f;;
- : (int * string) array =
[|(0, "iocvx"); (3, "zqped"); (3, "dacht"); (4, "blann"); (7, "tvqej");
(7, "rncpn"); (7, "khott"); (8, "esmup"); (8, "xhlzj"); (9, "ouors")|]
The initial array has elements (7, "rncpn")
and (7, "khott")
in this very order. In the same order, the appear in the resulting
array. Other stable sorting algorithm is insertion sort. Not all
sorting algorithms are stable though. Try to answer, whether merge
sort is stable? What about Quicksort?
5.3.4. Radix Sort
The stability comes into play, when one sorting algorithm uses another one as a black-box, relying on the fact that original order of elements in partially-sorted arrays with “almost-same” keys will be preserved.
As an example, radix sort is a linear-time sorting, building on the idea of bucket-sort, but making it scale logarithmically, which is necessary if the space of possible keys is too large (e.g., comparable with the length of an array, in which case bucket sort’s complexity becomes quadratic). It makes use of bucket sort as its component, applying it iteratively and sorting a list of integer-keyed elements per key digit, startgin from the smallest register:
let radix_sort arr =
let len = Array.length arr in
let max_key =
let res = ref 0 in
for i = 0 to len - 1 do
if arr.(i) > !res
then res := arr.(i)
done; !res
in
if len = 0 then arr
else
let radix = ref max_key in
let ls = array_to_list arr in
let combined = list_to_array (list_zip ls ls) in
let res = ref combined in
while !radix > 0 do
res := kv_bucket_sort 10 !res;
for i = 0 to len - 1 do
let (k, v) = !res.(i) in
!res.(i) <- (k / 10, v)
done;
radix := !radix / 10
done;
let result_list = array_to_list !res in
list_to_array result_list |> Array.map snd
It starts by determining the largest key max_key
in the initial
array. Next, it creates an array combined
, which pairs all
elements in the original array with their keys. In the while
loop,
it sorts elements, using kv_bucket_sort
, based on their digit. It
starts from the lowest register, and then keeps dividing the key
component of each element, “attached” for the sorting purposes, by 10,
repeating the bucket sort, until it runs out of registers.
How many iterations the while
-loop will make? Notice that each
time it divides the key space by 10, so it will only run for
\(\log_{10}( \mathtt{max\_key})\) iterations. This determines the
complexity of the radix sort, which is, therefore \(O(n
\log(\mathtt{max\_key}))\), i.e., it is linear if max_key
is
considered as a constant.
One can test the implementation of radix sort as follows:
let%test "radix-sort" =
let a = generate_int_array 1000 in
let b = radix_sort a in
array_sorted b &&
same_elems (array_to_list a) (array_to_list b)