Alibaba releases flagship reasoning model Qwen3-Max-Thinking
Alibaba officially launches the Qwen3-Max-Thinking flagship reasoning model in the Thousand Questions series.
On January 26th, Alibaba officially launched the Qwen3-Max-Thinking flagship reasoning model in the Thousand Questions series. It is reported that Qwen3-Max-Thinking has significantly improved in several key dimensions, including factual knowledge, complex reasoning, instruction following, alignment with human preferences, and intelligence capability. In 19 authoritative benchmark tests, its performance is comparable to top models such as GPT-5.2-Thinking, Claude-Opus-4.5, and Gemini 3 Pro.
Qwen3-Max-Thinking introduces two core innovations:
(1) Adaptive tool invocation capabilities, allowing on-demand access to search engines and code interpreters, now available in Qwen Chat;
(2) Test-time scaling technology, significantly improving reasoning performance, surpassing Gemini 3 Pro on critical reasoning benchmarks.
The table below shows a more comprehensive evaluation score:
Adaptive Tool Invocation Capability
Unlike earlier methods that required manual tool selection by users, Qwen3-Max-Thinking can autonomously select and use its built-in search, memory, and code interpreter functions during conversations. This capability stems from a specially designed training process: after initial fine-tuning of tool usage, the model undergoes further training on diverse tasks using rule-based and model-based feedback. Experiments show that search and memory tools can effectively alleviate illusions, provide real-time information access, and support more personalized responses. The code interpreter allows users to execute code snippets and apply computational reasoning to solve complex problems. These functions collectively provide a smooth and powerful conversation experience.
Test-Time Scaling Technology
Test-time scaling refers to the technique of allocating additional computational resources during the reasoning stage to enhance model performance. We propose an experience-accumulative, multi-round iterative test-time scaling strategy. Unlike simply increasing the number of parallel reasoning paths (which often leads to redundant reasoning), we limit and allocate the saved computational resources for iterative self-reflection guided by an "experience extraction" mechanism. This mechanism extracts key insights from past reasoning rounds, enabling the model to avoid redundantly deducing known conclusions and instead focus on unresolved uncertainties. Crucially, compared to directly referencing the original reasoning trajectory, this mechanism achieves higher efficiency in utilizing context, effectively integrating historical information within the same context window. With roughly the same token consumption, this method consistently outperforms standard parallel sampling and aggregation methods: GPQA (90.3 92.8), HLE (34.1 36.5), LiveCodeBench v6 (88.0 91.4), IMO-AnswerBench (89.5 91.5), and HLE (w/ tools) (55.8 58.3).
Qwen3-Max-Thinking is now available on Qwen Chat, allowing users to interact directly with the model and its adaptive tool invocation capabilities. Additionally, the API for Qwen3-Max-Thinking (model name qwen3-max-2026-01-23) has also been opened for use.
Related Articles

VPOWER GROUP(01608): Liquidation request hearing postponed to May 13th.

China Longyuan Power Group Corporation (00916) has completed the issuance of commercial paper.

NANHUA FUTURES (02691) has been approved as a member of Nodal Exchange.
VPOWER GROUP(01608): Liquidation request hearing postponed to May 13th.

China Longyuan Power Group Corporation (00916) has completed the issuance of commercial paper.

NANHUA FUTURES (02691) has been approved as a member of Nodal Exchange.

RECOMMEND

New Record Achieved As Spot Gold Tops $5,000 For The First Time; Institutions Set $6,600 Target
27/01/2026

117 Companies Raised Over HKD 285.6 Billion Through IPOs As Hong Kong Reclaims Global Leadership In 2025
27/01/2026

“A+H Hong Kong Listing Requires RMB 30 Billion Market Cap”? On‑Site Inquiry Dispels The Rumor
27/01/2026


