Ali Qianwen launches native visual-language model Qwen3.5-397B-A17B

date
08:40 17/02/2026
avatar
GMT Eight
On February 16th, Ali Qianwen officially released Qwen 3.5.
On February 16th, Ali Qwen officially released Qwen3.5 and introduced the first model in the Qwen3.5 series, the open weight version of Qwen3.5-397B-A17B. As a native vision-language model, Qwen3.5-397B-A17B excelled in comprehensive benchmark evaluations such as inference, programming, intelligent capabilities, and multimodal understanding. The model adopted an innovative hybrid architecture combining linear attention (Gated Delta Networks) with sparse mixture of experts (MoE) to achieve outstanding inference efficiency: with a total parameter quantity of 397 billion, only 170 billion parameters are activated in each forward pass, optimizing speed and cost while maintaining capabilities. Language and dialect support has expanded from 119 to 201 varieties, providing broader availability and enhanced support for global users. It was reported that compared to the Qwen3 series models, the performance improvement of Post-training in Qwen3.5 mainly came from a comprehensive expansion of various RL tasks and environments. The emphasis is on the difficulty and generalization of RL environments, rather than optimizing for specific metrics or narrow query categories. Qwen3.5 achieves efficient native multimodal training through heterogeneous infrastructure: decoupling parallel strategies in the vision and language components, avoiding inefficiencies brought by a unified solution. Sparse activation is utilized to achieve overlap computation across modules, reaching nearly 100% training throughput compared to the pure text baseline on a mixture of text-image-video data. Building on this, the native FP8 pipeline utilizes low precision for activations, MoE routing, and GEMM operations, while maintaining BF16 at sensitive layers through runtime monitoring, achieving approximately 50% reduction in activation memory and over 10% acceleration, and stable scaling to tens of trillions of tokens.