Comment Conceptually interesting even without the AI (Score 1) 63
This sounds like a somewhat interesting problem. If you ignore the AI aspect and think about how you could solve for "what is in the room" from the timing frame data with direct mathematical/computational approaches, I can see how you could probably make some reasonable estimates as long as you are allowed to make a few assumptions.
For example:
Start with a mostly empty room, assume it is a normal "rectangular" room with 90 degree angles, etc, and that you can filter out / ignore "secondary" reflections (at least initially). Then you can look for things like the "last" strong reflection (probably indicating how far away the furthest corner of the room is), and certain "edges" in the data (corresponding to how different percentages of the light come from different walls at different times), and from the timing and shapes of those edges derive reasonable estimates for the length, width, and height of the room, as well as the position of the sensor point within the room.
From there you could look for spikes in the signal, corresponding to (point?) objects within the room. Initially you just know how far they are (on a spherical or elliptical shell), but then you can also look for corresponding dips in the wall signals later on from the shadow, to help narrow it down to curved lines on the shell (or maybe sets of possible points). You may also be able to use the initially-ignored secondary reflections to further refine things.
If you include multiple pulses (over a longer timescale), you might assume probably-constant room size and that objects can move between "pulse frames", but probably not far.
You could also potentially generalize some of the assumptions by doing similar analysis of what to expect from triangular rooms, rooms with curved walls, the oval office, parabolic focusing walls, etc. There is probably enough differences in most of these to distinguish several of them fairly robustly. But as you add more complex possibilities, the likelyhood of making mistakes trying to disambiguate the possibilities increases.
I suspect such an algorithm would likely be confused in some way by the confusing-perspective "Ames room": https://www.researchgate.net/f... or https://en.wikipedia.org/wiki/... unless it was specifically accounted for in the algorithm planning/design (or AI training). And there is probably some kinds of timing-based analogs to the the Ames room that such an algorithm simply could not distinguish at all.
It also gets more and more difficult to handle all these possibilities "by hand". Which is where the nueral net comes in. You can basically give up trying to figure out these subtleties yourself, and just hope the nueral net can figure them out as part of its training.
-----
However, ultimately I doubt it would be worth it to spend a lot of time and effort trying to improve an algorithm that ONLY uses the timing frame information.
For example, adding a simple 2D camera to the same observation point as the timing pulse receiver should only barely increase various costs (size, weight, power, and/or simply monetary), while dramatically reducing the need to make a lot of not-always-valid assumptions. And/or use 2 or 4 or more timing receivers at different positions, analogous to our two eyes. It still needs to make some assumptions to match up the correct parts of the different inputs to each other, but that is a much more straightforward problem to solve with fewer assumptions. As for quality, although I expect this approach would probably still be less robust/accurate that a high-end scanning lidar system, I expect it could get fairly close, but with lower costs.