Comment This is where Indic languages shine (Score 1) 508
Indian languages derived from Sanskrit are built phonetically. Once one learns to read and write the language, there is no concept of mispronunciation while reading or misspellings while writing. A writer using an Indic script is converting the sound syllables into a phonetic description on paper. This is reversibly true, in that, a reader encountering a new word will be able to instantly and completely construct the sounds just by parsing.
Consider the following about English: each consonant has a different number of vowel sounds. The problem arises that there is no suitable method of representing these variations in the script.
A writer of Hindi (for example) has 30 consonants and 12 vowel sounds which can be applied to every consonant. Of course this is not unique to Indian languages. In conversations with native speakers of East-African languages, i've gathered that most of their languages are similar in these respects though with only 9 vowel sounds. But the universal theme is that in all (or perhaps almost all) cases of phonetic languages, one is able to derive a uniform matrix of sounds where each sound is well-represented by the script of the language.
So powerful are phonetic languages that Gmail's initial support for transliteration had support for five Indian languages--and no others. The service has since been expanded to support even more phonetic languages.
It is my opinion that many of the NLP problems which remain problematic for western languages will be first solved for phonetic languages due to the relatively low complexity and the richness of the scripts.
Cheers.
Disclaimer: I am not a linguist. Though i have worked on some language translation problems and have, over the years, gained accidental exposure to many languages, though to unequal extents.