Sounds good if they make the corpus freely available. Having lots of free high quality audio ...
I agree, but from a quick look at their page, I see a lot of problems with reaching that goal.
1: Most computers I've seen have pretty wretched audio inputs: tiny microphones near the screen, so not anywhere near the speaker's mouth. So we can expect lots of noise, echo, and other stuff. Good for simulating the real world (because it basically is the real world), but not what I would call high quality. Some gamers and others probably use good quality headsets, but I doubt they will make up the majority of the data base. Audio might be pretty good if the speakers use cell phones.
2: People reading written text don't talk the same way as in natural conversation. That's going to be a limitation for some developers.
3: They seem to be depending on the generosity/curiosity of people to generate and validate the samples. That's a hard way to get thousands to enroll. If they had some kind of game or other system that provides a psychic reward/incentive to the users I'd be more confident of a good response.
And a final comment: I hope they're sampling at 16 kHz instead of 8. To explain: Nyquist's Theorem says the sampling rate needs to be more than twice the highest frequency component in the analog signal. Speech typically contains components up to about 6 or 7 kHz, so 16k is a good number. Unfortunately, the carbon microphones that phones used for the first 100 years or so only go up to about 4kHz, so Ma Bell (remember her?) settled on an 8kHz rate in the middle of last century, and most everybody else has accepted that ever since.