Ant Group releases the open source full-modal large model Ming-Flash-Omni 2.0

date
15:17 11/02/2026
avatar
GMT Eight
On February 11th, Ant Group officially open-sourced its latest generation of full-model large model Ming-Flash-Omni 2.0 to the public.
On February 11th, Ant Group officially open-sourced its latest generation full-modal large model Ming-Flash-Omni 2.0. The model has shown outstanding performance in multiple public benchmark tests, especially in core capabilities such as visual language understanding, speech-controllable generation, and image generation and editing, with some indicators surpassing Gemini 2.5 Pro. Ming-Flash-Omni 2.0 is also the industry's first model to support unified generation of audio in all scenes, enabling synchronous synthesis of speech, environmental sounds, and background music within a single audio track. Users can finely control parameters such as tone, speech rate, pitch, volume, emotion, and dialect through natural language commands. In terms of inference efficiency, the model achieves an extremely low inference frame rate of 3.1Hz, capable of real-time generation of high-fidelity minute-long audio while significantly optimizing computational costs and response speed while ensuring generation quality. Ant Group has been investing in the full-modal direction for several years, with the Ming-Omni series iterating through three versions. The open-sourcing of Ming-Flash-Omni 2.0 means that its core capabilities are released to the public in the form of a "reusable base," providing a unified capability entry point for end-to-end multimodal application development. Users can also experience and call the model online through the official Ant Bells platform Ling Studio.