I would've derived it differnetly. Rather than explicitly setting up the equation for squared distances then finding the solution that minimizes it, I would've just used the normal equations to do the least squares part for me. IMO this makes it a bit more "mechanical" in figuring out the soluion (I just need to figure out what equation I'm trying to solve, the normal equation can be memorized and will figure out how to compute the least squares solution to it).
So we already have the equation from the stackexchange thing for computing the "offset vector" from a point to the line. You can express this in different ways, but they way they did it is fine. Basically the 3D vector telling us how far a point x is from the line described by an origin o and a direction u.
ray_to_point_offset = (I - u*ut)*x - (I-u*ut)*o
That first term is "x projected onto a plane with normal u", and the second term is "ray origin projected to the same plane". So the difference between the two is just finding the perpendicu