The report I read was that the neural net which distinguishes phonemes is trained up to the age of around 10-14.
Out of the 110 (approx) (IIRC) human phonemes, most languages use no more than 85 (approx) (IIRC), sometimes far fewer.
The classic Japanese/English "L"/"R" problem is an symptom of this, where for a Japanese person who hasn't been exposed to the "L" sound regularly at a young age, it is mapped to an "R" sound.
Note also, that the single "R" sounds that the Japanese-language person is making instead of "L" and "R" may not be the "R" sounds that the English-language person is hearing. Different "L" and "R" sounds may spoken by Japanese-language person, but the English-language person may only hear them as a single "R" sound. Since there's no common frame of reference, the phoneme corruption could be happening in either or both directions for any phoneme mapping.
I recall reading somewhere else that the French language has three different sounds which map to the English "R" sound. That's my excuse for scraping high-school French, anyway.
There are people who are exceptions to the rule, of course, and there's also the possibility of learning to speak a language correctly by an external feedback loop. All you need is to make different sounds until a person who can hear the difference confirms when the sound is correct, and use that mouth/larynx shape when appropriate. Easy!