2.1. Arrays and Operations on Them
File:
ArrayUtil.ml
So far the main data structure we’ve been looking at and using as a
container is an algebraic list. While simple to work with and grow by
adding new elements to the beginning, algebraic lists have a
significant shortcoming: they do not allow an instant access to their
elements. For instance, in a list [6; 8; 5; 2; 3; 7; 0]
, in order
to obtain its fourth element, one needs to “peel off” four previous
elements by means of deconstructing the list, as implemented by the
function nth
from the standard OCaml library:
let nth l n =
if n < 0 then invalid_arg "List.nth" else
let rec walk l n =
match l with
| [] -> failwith "nth"
| a::l -> if n = 0 then a else walk l (n-1)
in walk l n
Arrays are similar but also complementary to lists. They also encode data structured in a sequence, but allow immediate access to their elements, referred to by an index (i.e., position in an array). At the low-level, arrays are implemented by means of fixed offsets, and take the full advantage of the random-access memory (RAM), implemented by the modern computer architectures and allowing one to access a location with a known address almost immediately.
The price to pay for the instant access is the inability to change the size of an array dynamically. In essence, once array is created, it “reserves” a fixed sequence of memory locations in RAM. Indeed, since more data can be allocated after the array, it is not easy to allow for its future growth. Therefore, the only way to extend (or shrink) and array is to allocate a new array of the necessary size.
In OCaml, arrays with all known elements can be created using the following syntax:
let a1 = [|6; 8; 5; 2; 3; 7; 0|]
creates an array with 7 numbers and assigns its reference to a1
.
It is also possible to create an array of a fixed size, filled with
some “default” element. For instance,:
let a2 = Array.make 10 0
Creates an array of size 10, filled with zeroes.
Elements of an array are accessed using their indices:
# a1.(2);;
- : int = 5
# a1.(0);;
- : int = 6
Notice that the indices start from 0 (not from 1), and end with the number equal to an array’s length minus one. This is often confusing and might lead to the infamous Off-by-one error. An attempt to address the elements outside of this range lead to an exception:
# a1.(7);;
Exception: Invalid_argument "index out of bounds".
One can determine the range of indices as the length of an array as follows:
# Array.length a1;;
- : int = 7
# a1.((Array.length a1) - 1);;
- : int = 0
The elements of an array can be altered using the following syntax.
Notice, that upon changing an array’s element, no new array is created
(hence the update’s type is unit
), and it’s the initial array that
is modified. In this sense, arrays are similar to references, that are
modified in-place:
# a1;;
- : int array = [|6; 8; 5; 2; 3; 7; 0|]
# a1.(0) <- 12;;
- : unit = ()
# a1;;
- : int array = [|12; 8; 5; 2; 3; 7; 0|]
The following functions swaps two elements of an array indexed via
i
and j
(assuming they are within the array indexing bounds):
let swap arr i j =
let tmp = arr.(i) in
arr.(i) <- arr.(j);
arr.(j) <- tmp
It is useful to be able to print a sub-array arr.(l) .. arr.(u - 1)
of
a given array arr
for debugging purposes. Here, for the sake of
simplicity we assume the array to be filled with integers:
let print_int_sub_array l u arr =
assert (l <= u);
assert (u <= Array.length arr);
Printf.printf "[| ";
for i = l to u - 1 do
Printf.printf "%d" arr.(i);
if i < u - 1
then Printf.printf "; "
else ()
done;
Printf.printf " |] "
let print_int_array arr =
let len = Array.length arr in
print_int_sub_array 0 len arr
Notice that the procedure print_int_sub_array
employs the special
bounded iteration loop of a general form:
for variable = start_value to end_value do
expression
done
for variable = start_value downto end_value do
expression
done
In both cases start_value
and end_value
must be of type
int
, and expression
is of type unit
. There is no way to
“break” from the iteration in OCaml, interrupting it, hence sometimes
it is more preferable to use a more general while
-loop.