Introduction

In the fourth chapter of “Programming Pearls”, Jon Bentley discusses program correctness and tells us that as part of some of his programming classes, he asks the attendees to implement the binary search algorithm. Although simple in appearance (“look at the middle element, if it’s the target terminate, if it’s smaller look in the upper half, otherwise look in the lower half”), it can be a surprisingly tricky algorithm to implement. He cites TAoCP Vol. 3 wherein Don Knuth mentions that although the first paper on binary search was published in 1946, it wasn’t until 1962 that a bug-free version was presented.

In this literate org file, I’ll attempt to implement binary search in Rust, and hopefully, get it right.

Plan

My favorite way of implementing binary search is with recursion; the recursive algorithm works fine in a language with tail-call elimination, however, this optimization is not available in Rust, and thus is not how I’d implement the algorithm if I wanted to put it in production. Therefore, I’ll use a loop to implement the algorithm instead.

Our function will have the following signature:

fn binsearch<T: PartialOrd>(target: &T, collection: &[T]) -> Option<usize>;

Our function is parametrized over all types T that are partial orderable (e.g., all integer types, floating number types, strings, chars, etc.) We used the word “array” before to talk about the collection being searched, however we’ll be more general and use a slice in our Rust program; therefore the function works with arrays, but also vectors and other types of collections.

Traditionally, the return value of binary search is a signed integer; the slice is indexed from 0 to $N-1$, and the special return value -1 is used to denote the absence of the target. In Rust, slices are indexed with the type usize, an unsigned, machine-sized integer, so -1 is really 2⁶⁴-1 (on a 64-bit machine), which is a valid slice index. We could return an isize value, but instead, the return type of our function shall be Option<usize>: the value Some(i) is returned if the target is found at index i and None if the target is absent. This pattern is also more consistent with idiomatic Rust and more robust (we cannot forget to check the return value before attempting to use it).

Tests

Rust allows unit tests to be written quite easily by marking a function with the #[test] attribute. We can then use the command rustc --test or cargo test to build and execute those tests.

Let us start by writing some unit tests where the target is part of the array.

#[test]
fn test_present() {
    // i32
    assert_eq!(Some(0), binsearch(&0_i32, &[0, 1, 2, 4, 8, 16]));
    assert_eq!(Some(1), binsearch(&1_i32, &[0, 1, 2, 4, 8, 16]));
    assert_eq!(Some(2), binsearch(&2_i32, &[0, 1, 2, 4, 8, 16]));
    assert_eq!(Some(3), binsearch(&4_i32, &[0, 1, 2, 4, 8, 16]));
    assert_eq!(Some(4), binsearch(&8_i32, &[0, 1, 2, 4, 8, 16]));
    assert_eq!(Some(5), binsearch(&16_i32, &[0, 1, 2, 4, 8, 16]));

    // char
    assert_eq!(Some(0), binsearch(&'a', &['a', 'b', 'c']));
    assert_eq!(Some(1), binsearch(&'b', &['a', 'b', 'c']));
    assert_eq!(Some(2), binsearch(&'c', &['a', 'b', 'c']));

    // vector
    assert_eq!(Some(0), binsearch(&0.0, &vec![0.0, 1.0, 2.0]));
    assert_eq!(Some(1), binsearch(&1.0, &vec![0.0, 1.0, 2.0]));
    assert_eq!(Some(2), binsearch(&2.0, &vec![0.0, 1.0, 2.0]));
}

And some cases where the target is not part of the slice. We’ll include the edge case of an empty slice here.

#[test]
fn test_absent() {
    assert_eq!(None, binsearch(&0, &[]));
    assert_eq!(None, binsearch(&0, &[1]));
    assert_eq!(None, binsearch(&2, &[1]));
    assert_eq!(None, binsearch(&0, &[1, 3]));
    assert_eq!(None, binsearch(&2, &[1, 3]));
    assert_eq!(None, binsearch(&4, &[1, 3]));

    assert_eq!(None, binsearch(&3.14, &vec![]));
    assert_eq!(None, binsearch(&3.14, &vec![0.0, 1.0, 2.0]));
}

Implementation

Our implementation first needs to establish the bounds of the sub-slice that we will be examining. Initially, that interval is $[0, N)$ (where $N$ is the number of elements in the slice).

If at any point the lower bound of the interval equals or exceeds the upper bound, we break and we know we haven’t found our target; the sub-slice delimited by those bounds is empty.

The computation of the mid-point uses a small trick to avoid an integer overflow. The following equations show that this is equivalent to computing the mid-point naively.

If the target is found, we perform an early return, otherwise we update the interval’s bounds: if the target is contained in the lower half of the slice, we modify the bounds to be $[lo, m)$, otherwise the new bounds are $[m+1, hi)$.

fn binsearch<T: PartialOrd>(target: &T, collection: &[T]) -> Option<usize> {
    let mut lo: usize = 0;
    let mut hi: usize = collection.len();

    while lo < hi {
        let m: usize = (hi - lo) / 2 + lo;

        if *target == collection[m] {
            return Some(m);
        } else if *target < collection[m] {
            hi = m;
        } else {
            lo = m+1;
        }
    }
    return None;
}

gnuvince/binsearch.org

Introduction

Plan

Tests

Implementation