Ant Open Source Ring-2.6-1T: Comprehensive Enhancement of Agent Execution Capability, Supporting two types of reasoning strength, high and xhigh.

date
15:40 15/05/2026
avatar
GMT Eight
On May 15th, Ant Financial officially announced the open sourcing of its flagship thinking model Ring-2.6-1T, which can handle billions of operations.
On May 15th, Ant Babel officially announced the open sourcing of its flagship mega-scale thinking model Ring-2.6-1T, and has released the weights on Hugging Face and ModelScope. Previously, the model was launched on OpenRouter and offered a limited-time free API experience. The goal of Ring-2.6-1T is not just to pursue larger parameter scales or higher single-point scores, but to cater to the real production environment that large models are entering: agent workflows, engineering development, scientific analysis, complex business systems, and enterprise automation processes. The model is specifically strengthened for agent scenarios, not only improving the quality of single answers, but optimizing the complete execution chain in real workflows: from task decomposition, step planning, tool invocation, context maintenance, to feedback correction and continuous progress in the execution process. Ring-2.6-1T introduces an adjustable Reasoning Effort mechanism, supporting two reasoning intensities: high and xhigh. The high mode is designed for more efficient agent workflows with high frequency, suitable for multi-round interactions, tool collaboration, task decomposition, and default production calls; while the xhigh mode is for high-difficulty complex reasoning tasks such as mathematics, scientific research, complex logical analysis, and multi-path exploration. Through the two configurations of high and xhigh, developers can dynamically allocate reasoning resources according to the nature of tasks: use high for daily workflow for higher efficiency, switch to xhigh for complex reasoning tasks to unleash maximum capabilities. In terms of training, Ring-2.6-1T adopts an asynchronous (Async) reinforcement learning training architecture, decoupling policy sampling from parameter updates into independent pipelines, achieving a significant increase in training throughput and resource utilization: parallel execution of sampling and updating greatly improves GPU utilization, and training efficiency is increased several times; supporting longer training cycles: the decoupling architecture naturally adapts to large-scale and long-term continuous training, avoiding training interruptions caused by synchronous bottlenecks.