Lots of people ask about this. If we did pure speech-to-text and text-to-speech, it would take about half the bandwidth but everybody would have the same synthesized voice. Once you start trying to add parameters to the synthesized voice such as pitch, speed, and tonality, those take as much bandwidth as we are using for the entire codec, because they are essentially the same parameters.
There are commercial codecs that get to slightly lower data rates, which the government presently uses.
I once had to ask the Pakistani military to not use the mailing list to ask questions, as I didn't want our ham radio project to get in ITAR trouble. Of course they can still use the code, it's Open Source. But they have to get help elsewhere.
"If you own a machine, you are in turn owned by it, and spend your time serving it..." -- Marion Zimmer Bradley, _The Forbidden Tower_