MiniMax (00100) released the new generation model M3: 1 million context, flagship programming, and native multimodal capabilities.
On June 1st, the domestic large-scale modeling company MiniMax (00100) officially released the new generation universal model MiniMax M3.
Domestic large-scale model track welcomes heavyweight product release again. On June 1st, the domestic large-scale model company MiniMax (00100) officially released the new generation universal model MiniMax M3. M3 adopts a new self-developed sparse attention architecture MiniMax Sparse Attention (MSA), achieving intergenerational breakthroughs in programming and agent capabilities, ultra-long context, and native multimodal capabilities in multiple key directions. It is reported that M3 is the first large model in China with three core capabilities: "cutting-edge coding ability, 1M ultra-long context, native multimodal", and currently the only open-source option with a complete set of capabilities.
Core breakthrough: Rewriting millions of contexts from the bottom-up attention mechanism
Supporting the integration of the three major capabilities of M3 is its self-developed sparse attention architecture MSA (MiniMax Sparse Attention). Compared to traditional full attention mechanisms, MSA can significantly reduce the computational cost under long contexts and increase the context window to 1 million tokens. This means that the model can retain more complete information links in one inference when processing long documents, complex code repositories, multi-round task collaboration, etc. MiniMax revealed that at a scale of 1 million context, the computational cost per token of M3 is only about 1/20 of the previous generation model, significantly improving the efficiency of inference.
In addition to upgrading the model architecture, MiniMax has also further optimized the underlying inference operator level. By redesigning the data reading and calculation paths, the performance has been improved by more than 4 times compared to mainstream open-source solutions.
In the industry's view, this is also an important new variable in the global large-scale model competition. As the complexity of agent tasks continues to increase, "longer context, more stable memory, lower cost inference" is becoming a key capability determining product usability.
Programming and agents: Multiple indicators on par with top international closed-source models
M3 significantly improves in Coding & Agentic capabilities, achieving international leading levels in various dimensions of international authoritative evaluations covering software engineering, terminal execution, efficiency, and protocol understanding: on the SWE-Bench Pro evaluating coding abilities, MiniMax M3 surpasses GPT-5.5 and Gemini 3.1 Pro, and approaches Opus 4.7. In the benchmark SVG-Bench evaluating SVG generation performance, MiniMax M3 exceeds Opus 4.7.
On the multimodal test set OmniDocBench, MiniMax M3 scores higher than Gemini 3.1 Pro, and on the end-to-end evaluation framework Claw-Eval for autonomous agents, MiniMax M3 scores the highest.
According to reports, M3 innovatively introduces an interactive user simulator framework in coding and agent training - by simulating the behavior patterns of real developers in the collaboration process, the model can be exposed to interaction scenes closer to the production environment during training and evaluation phases. It is believed in the industry that Coding & Agentic capabilities are gradually becoming a new focus of competition for top global models. MiniMax has emphasized this ability in this update, which is also seen by external observers as paving the way for the next stage of AI product development.
Native multimodal: Training data scale up to 100 trillion Token level
MiniMax stated that M3 has adopted a multimodal mixed training from the beginning with text, images, videos, etc., and further expanded on the data scale and training pipeline. The model not only supports image and video understanding, but also has desktop operation capabilities, enabling it to execute Computer Use tasks in complex cross-application environments.
M3 is a model that starts training with multimodal mixed training from Step 0. MiniMax emphasizes in the report that interleaved data - data where other modalities such as text and images are naturally arranged alternately in the sequence - is more crucial to improving model performance than generally believed. After reconstructing the entire set of data pipelines for these data, MiniMax has been able to increase the training data Token scale to the level of 100 trillion.
This means that the model's capabilities are further extending from language understanding to real digital environments. Whether it's office automation, enterprise software operations, or more complex productivity scenarios, the speed at which AI enters practical execution is significantly accelerating.
Homogeneous training: MiniMax Code receives important updates
On the same day, MiniMax Code also received an update: as an Agent product designed specifically for M3 and trained with M3, MiniMax Code can fully leverage M3's capabilities in long contexts, coding/agentic, and native multimodal aspects. This is the preferred Agent to complement MiniMax-M3. In complex long-term tasks, MiniMax Code's Agent Team can break down large tasks into multiple stages, concurrent, and dynamically adjustable workflows, collaborating with the Agent cluster.
On the commercial side, MiniMax also launched the Token Plan subscription plan. The Plus version costs 49 yuan per month, providing 6 billion tokens; the Max version costs 119 yuan per month, providing 18 billion tokens; and the Ultra version costs 469 yuan per month, providing 55 billion tokens.
Industry insiders believe that with the release of M3, MiniMax's positioning in the global AI competition is becoming clearer: it differentiates itself with "open-source + multiple capabilities in one" cutting-edge models, filling the gap in this dimension of the domestic AI ecosystem.
The AI industry is still in a high-speed evolution stage, as model capabilities continue to approach real work scenarios, the next round of competition around Agents has already begun.
Related Articles

HK Stock Market Move | MAN YUE TECH(00894) increased by over 60%, with its stock price soaring nearly six times in the past month. The company is a manufacturer of aluminum electrolytic capacitors.

HK Stock Market Move | SENSETIME-W(00020) rose more than 8% during the trading day. Seko Industry Base launched in Shanghai, exploring the domestic support path for AI micro-drama full-chain localization.

Back-to-school season PC battle reignites: Dell Technologies, Inc. Class C (DELL.US) challenges Apple Inc. (AAPL.US) with a $699 touchscreen XPS 13.
HK Stock Market Move | MAN YUE TECH(00894) increased by over 60%, with its stock price soaring nearly six times in the past month. The company is a manufacturer of aluminum electrolytic capacitors.

HK Stock Market Move | SENSETIME-W(00020) rose more than 8% during the trading day. Seko Industry Base launched in Shanghai, exploring the domestic support path for AI micro-drama full-chain localization.

Back-to-school season PC battle reignites: Dell Technologies, Inc. Class C (DELL.US) challenges Apple Inc. (AAPL.US) with a $699 touchscreen XPS 13.






