In the fourth chapter of “Programming Pearls”, Jon Bentley discusses program correctness and tells us that as part of some of his programming classes, he asks the attendees to implement the binary search algorithm. Although simple in appearance (“look at the middle element, if it’s the target terminate, if it’s smaller look in the upper half, otherwise look in the lower half”), it can be a surprisingly tricky algorithm to implement. He cites TAoCP Vol. 3 wherein Don Knuth mentions that although the first paper on binary search was published in 1946, it wasn’t until 1962 that a bug-free version was presented.
In this literate org file, I’ll attempt to implement binary search in Rust, and hopefully, get it right.
My favorite way of implementing binary search is with recursion; the recursive algorithm works fine in a language with tail-call elimination, however, this optimization is not available in Rust, and thus is not how I’d implement the algorithm if I wanted to put it in production. Therefore, I’ll use a loop to implement the algorithm instead.
Our function will have the following signature:
fn binsearch<T: PartialOrd>(target: &T, collection: &[T]) -> Option<usize>;
Our function is parametrized over all types T
that are partial
orderable (e.g., all integer types, floating number types, strings,
chars, etc.) We used the word “array” before to talk about the
collection being searched, however we’ll be more general and use a
slice in our Rust program; therefore the function works with arrays,
but also vectors and other types of collections.
Traditionally, the return value of binary search is a signed integer;
the slice is indexed from 0 to usize
, an unsigned, machine-sized integer, so
-1 is really 264-1 (on a 64-bit machine), which is a valid slice
index. We could return an isize
value, but instead, the return type
of our function shall be Option<usize>
: the value Some(i)
is
returned if the target is found at index i
and None
if the target
is absent. This pattern is also more consistent with idiomatic Rust
and more robust (we cannot forget to check the return value before
attempting to use it).
Rust allows unit tests to be written quite easily by marking a
function with the #[test]
attribute. We can then use the command
rustc --test
or cargo test
to build and execute those tests.
Let us start by writing some unit tests where the target is part of the array.
#[test]
fn test_present() {
// i32
assert_eq!(Some(0), binsearch(&0_i32, &[0, 1, 2, 4, 8, 16]));
assert_eq!(Some(1), binsearch(&1_i32, &[0, 1, 2, 4, 8, 16]));
assert_eq!(Some(2), binsearch(&2_i32, &[0, 1, 2, 4, 8, 16]));
assert_eq!(Some(3), binsearch(&4_i32, &[0, 1, 2, 4, 8, 16]));
assert_eq!(Some(4), binsearch(&8_i32, &[0, 1, 2, 4, 8, 16]));
assert_eq!(Some(5), binsearch(&16_i32, &[0, 1, 2, 4, 8, 16]));
// char
assert_eq!(Some(0), binsearch(&'a', &['a', 'b', 'c']));
assert_eq!(Some(1), binsearch(&'b', &['a', 'b', 'c']));
assert_eq!(Some(2), binsearch(&'c', &['a', 'b', 'c']));
// vector
assert_eq!(Some(0), binsearch(&0.0, &vec![0.0, 1.0, 2.0]));
assert_eq!(Some(1), binsearch(&1.0, &vec![0.0, 1.0, 2.0]));
assert_eq!(Some(2), binsearch(&2.0, &vec![0.0, 1.0, 2.0]));
}
And some cases where the target is not part of the slice. We’ll include the edge case of an empty slice here.
#[test]
fn test_absent() {
assert_eq!(None, binsearch(&0, &[]));
assert_eq!(None, binsearch(&0, &[1]));
assert_eq!(None, binsearch(&2, &[1]));
assert_eq!(None, binsearch(&0, &[1, 3]));
assert_eq!(None, binsearch(&2, &[1, 3]));
assert_eq!(None, binsearch(&4, &[1, 3]));
assert_eq!(None, binsearch(&3.14, &vec![]));
assert_eq!(None, binsearch(&3.14, &vec![0.0, 1.0, 2.0]));
}
Our implementation first needs to establish the bounds of the
sub-slice that we will be examining. Initially, that interval is
If at any point the lower bound of the interval equals or exceeds the upper bound, we break and we know we haven’t found our target; the sub-slice delimited by those bounds is empty.
The computation of the mid-point uses a small trick to avoid an integer overflow. The following equations show that this is equivalent to computing the mid-point naively.
If the target is found, we perform an early return, otherwise we
update the interval’s bounds: if the target is contained in the lower
half of the slice, we modify the bounds to be
fn binsearch<T: PartialOrd>(target: &T, collection: &[T]) -> Option<usize> {
let mut lo: usize = 0;
let mut hi: usize = collection.len();
while lo < hi {
let m: usize = (hi - lo) / 2 + lo;
if *target == collection[m] {
return Some(m);
} else if *target < collection[m] {
hi = m;
} else {
lo = m+1;
}
}
return None;
}