Having two lenses is not a requirement to capture stereoscopic images. It can be done with a single (big) lens, and two slightly different sensor locations. But you're limited by the distance between those two sensors, and a single large lens isn't necessarily cheaper or easier to use than two smaller ones.
What this system does is use the out-of-focus areas as a sort of "displaced" sensor - like moving the sensor within a small circle, still inside the projection cone of the lens - and therefore simulating two (or more) images captured at the edges of the lens.
But, unless the lens is wider than the distance between two eyes, you can't really use this to create realistic stereoscopic images at a macroscopic scale. The information is simply not there. Even if you can extract accurate depth information, that is not quite the same as 3D. A Z-buffer is not a 3D scene; it's not sufficient for functional stereoscopy.
Microscopy is a different matter. In fact, there are already several stereoscopic microscopes and endoscopes that use a single lens to capture two images (with offset sensors). Since the subject is very small, the parallax difference between the two images can be narrower than the width of the lens and still produce a good 3D effect. Scaling that up to macroscopic photography would require lenses wider than a human head.