This code was written when a new edition of a book series I like came out and I wanted to find out what was different between the old and new versions. The goal was to create readable diffs of English prose so that I could scan through the books and easily see what was different.
Most of the diffing tools out there are built with the assumption that their target texts are source code, or some other machine-optimized format, not prose, so I had to build some of my own tools.
The format that I found most readable had the following features: