Ali Qianwen launches native visual-language model Qwen3.5-397B-A17B
On February 16th, Ali Qianwen officially released Qwen 3.5.
On February 16th, Ali Qwen officially released Qwen3.5 and introduced the first model in the Qwen3.5 series, the open weight version of Qwen3.5-397B-A17B. As a native vision-language model, Qwen3.5-397B-A17B excelled in comprehensive benchmark evaluations such as inference, programming, intelligent capabilities, and multimodal understanding. The model adopted an innovative hybrid architecture combining linear attention (Gated Delta Networks) with sparse mixture of experts (MoE) to achieve outstanding inference efficiency: with a total parameter quantity of 397 billion, only 170 billion parameters are activated in each forward pass, optimizing speed and cost while maintaining capabilities. Language and dialect support has expanded from 119 to 201 varieties, providing broader availability and enhanced support for global users.
It was reported that compared to the Qwen3 series models, the performance improvement of Post-training in Qwen3.5 mainly came from a comprehensive expansion of various RL tasks and environments. The emphasis is on the difficulty and generalization of RL environments, rather than optimizing for specific metrics or narrow query categories.
Qwen3.5 achieves efficient native multimodal training through heterogeneous infrastructure: decoupling parallel strategies in the vision and language components, avoiding inefficiencies brought by a unified solution. Sparse activation is utilized to achieve overlap computation across modules, reaching nearly 100% training throughput compared to the pure text baseline on a mixture of text-image-video data. Building on this, the native FP8 pipeline utilizes low precision for activations, MoE routing, and GEMM operations, while maintaining BF16 at sensitive layers through runtime monitoring, achieving approximately 50% reduction in activation memory and over 10% acceleration, and stable scaling to tens of trillions of tokens.
Related Articles

Apple Inc. (AAPL.US) suddenly announced an offline experience event: Three cities will hold simultaneous events on March 4th.

The "Copper King" era begins: BHP Group Ltd Sponsored American Depositary Receipt Repr 2 Shs (BHP.US) copper business profit exceeds iron ore for the first time, with a sharp increase in net profit of 28% in the first half of the year.

HAO BAI INTL (08431) terminates proposed joint venture agreement
Apple Inc. (AAPL.US) suddenly announced an offline experience event: Three cities will hold simultaneous events on March 4th.

The "Copper King" era begins: BHP Group Ltd Sponsored American Depositary Receipt Repr 2 Shs (BHP.US) copper business profit exceeds iron ore for the first time, with a sharp increase in net profit of 28% in the first half of the year.

HAO BAI INTL (08431) terminates proposed joint venture agreement

RECOMMEND

Nine Companies With Market Value Over RMB 100 Billion Awaiting, Hong Kong IPO Boom Continues Into 2026
07/02/2026

Hong Kong IPO Cornerstone Investments Surge: HKD 18.52 Billion In First Month, Up More Than 13 Times Year‑On‑Year
07/02/2026

Over 400 Companies Lined Up For Hong Kong IPOs; HKEX Says Market Can Absorb
07/02/2026


