The KeLing 2.6 model introduces the ability of "sound and picture coming out together," and its Chinese speech generation effect is globally leading.

date
09:04 04/12/2025
avatar
GMT Eight
On December 3rd, Keling launched the video generation 2.6 model, which provides a groundbreaking "synchronized audio and video output" capability, fundamentally changing the traditional AI video generation model of "first silent pictures, then manually dubbed".
On December 3, Ke Ling launched the video generation 2.6 model, which provides a milestone "sound and image synchronization" capability, completely changing the traditional AI video generation model workflow of "silent image first, followed by manual dubbing". It can output a complete video containing natural language, action sound effects, and environmental ambient sound in a single generation, reconstructing the AI video creation workflow and greatly improving creation efficiency. This upgrade includes two major functions: text-to-speech and image-to-speech. Currently, voice support includes Chinese and English, and the generated video can be up to 10 seconds long. By deeply aligning the sound and dynamic images of the physical world, the Ke Ling 2.6 model excels in sound and image coordination, audio quality, and semantic understanding, while maintaining a leading position in Chinese speech generation worldwide.