Actually, these steps are highly interesting and it's the future of representing our world "as we see it". Photography claims to be a more objective way of representing the world around us (let's say in comparison to drawings/ paintings). The photographer still has abundant power over the framing of a scene and the timing of his capture.
Why not capture an entire scene over a period of time in 3d? The viewer can then choose his objective standpoint. This is certainly the objective for crime reconstruction.
Saying this, I don't think we're there yet: how would this system work in scenes where surfaces (such as walls) are in uniform colour: no features for tracking.
At the present state you still have to project visual markers into the scene (laser, structured light or whatever) in order to extract the 3d-structure of the scene. Otherwise it's a pretty poor "we're here" visual reprentation of our surroundings.