Alibaba upgrades its next-generation voice model Qwen3-TTS, which can generate human-like voices based on text and sound.

07/02/2026

On December 24th, Ali upgraded the Qwen3-TTS voice model family and launched two new models, Qwen3-TTS-VD for tone creation and Qwen3-TTS-VC for tone cloning. The new Qwen3-TTS models allow for DIY voice design and pixel-level tone imitation, even allowing animals to speak human language "natively." The tones are natural, the effects are stable, and the generation is efficient, which can accelerate the application of large voice models in professional fields such as audio novels, AI comics, and dubbing for films and television.