Zyphra is thrilled to unveil the beta release of Zonos-v0.1, which boasts two sophisticated and real-time text-to-speech models that include high-fidelity voice cloning capabilities. Our release features both a 1.6B transformer and a 1.6B hybrid model, all under the Apache 2.0 license. Given the challenges in quantitatively assessing audio quality, we believe that the generation quality produced by Zonos is on par with or even surpasses that of top proprietary TTS models currently available. Additionally, we are confident that making models of this quality publicly accessible will greatly propel advancements in TTS research. You can find the Zonos model weights on Huggingface, with sample inference code available on our GitHub repository. Furthermore, Zonos can be utilized via our model playground and API, which offers straightforward and competitive flat-rate pricing options. To illustrate the performance of Zonos, we have prepared a variety of sample comparisons between Zonos and existing proprietary models, highlighting its capabilities. This initiative emphasizes our commitment to fostering innovation in the field of text-to-speech technology.