I *gasp* read the actual document (http://www.pnas.org/content/early/2013/06/12/1221464110.full.pdf+html) and it sounds like some pretty complicated work. It relies on a bunch of separate microphones to listen in an absolutely silent room for the exact same noise and the echos of bounces. Since you know where the microphones are in relation to each other you can compute when the initial sound and echos hits each microphone and from there reverse construct where the sound must have originated and the echos tell you what it bounces of off.
The math is a bit beyond me after being out of university for so long, but it seem similar to transliteration using in GPS where thanks to very fast sensor readings you can figure out where you are in relation to a fixed signal. To compute the shape in the in a noisy environment I wonder if you can use a "known" sound where you could listen for only that and filter out the regular noise. Either way the computation involved would be impressive but maybe not for the elusive "5 years time" computer.
It would be cool to have something like this in my fishing boat where instead of a dot on the screen I could get something that tells me where the fish are and what kind too.
Maybe you could arrange them in a golumb ruler layout to further speed up processing... *sigh* Making websites pays well, but I miss computers science.