What other highlights are there at the Moore Thread (688795.SH) Developer Conference, including a new architecture, ten-thousand card cluster, and intelligent computing platform?

date
07:12 21/12/2025
avatar
GMT Eight
Moor Thread releases a new generation full-function GPU architecture "Huagang", unveiling two chips based on the new architecture, "Huashan" and "Lushan", which respectively dominate AI training and high-performance graphic rendering.
The leading domestic GPU company Moore Threads (688795.SH) is accelerating its ecological expansion. On December 20, Moore Threads held its inaugural "MUSA Developer Conference" (MDC 2025) in Beijing. Founder, Chairman, and CEO Zhang Jianzhong introduced the company's core achievement after five years of research and development - the new generation full-featured GPU architecture "Huagang." In a lengthy keynote speech, Zhang Jianzhong emphasized that "full functionality" is the technical foundation of Moore Threads. He believes that the essence of innovation of a full-featured GPU is a history of computational evolution, which means that GPU chips can handle most data units and formats. The "Huagang" architecture released this time uses a new generation instruction set, with a 50% increase in computational density from the previous generation and a 10 times improvement in energy efficiency, and will be put into production next year. It is worth noting that the "Huagang" architecture supports full precision from FP4 to FP64 and integrates the first-generation AI Generative Rendering Architecture (AGR) and the second-generation ray tracing hardware acceleration engine in graphics technology. Based on the "Huagang" architecture, Moore Threads also announced two core chip plans. The "Huashan" chip is designed for AI training and inference integration, while the "Lushan" chip is dedicated to high-performance graphics rendering. According to the information disclosed at the event, the "Huashan" chip is equipped with a new generation of asynchronous programming models, efficient thread synchronization, and thread bundle specialization features. In terms of tensor computation, the chip has full-precision MMA and is equipped with Moore Threads MTFP8/6/4 mixed low-precision computing technology. On the other hand, the "Lushan" chip optimizes task distribution and balance. Its AI computing performance has increased 64 times compared to the previous generation, geometric processing performance has increased 16 times, and it fully supports DirectX 12 Ultimate. The iteration of the chips ultimately needs to be implemented in large-scale engineering applications. In the context of the growing demand for intelligent computing centers in the industry, Moore Threads announced the launch of the "Kuawo" Ten Thousand Cards Intelligent Computing Cluster. The cluster's floating-point computation capacity reaches 10Exa-Flops. In terms of actual performance, the training power utilization efficiency (MFU) of the ten-thousand-card cluster on Dense large models reaches 60%, while for MOE large models, it is 40%, with training time accounting for over 90%. In response to the high market demand for inference performance, Moore Threads showcased its collaboration with Siliconflow. On the DeepSeek R1 671B full model, the Moore Threads MTT S5000 single card's Prefill throughput exceeded 4000 tokens/s and the Decode throughput exceeded 1000 tokens/s. This data indicates that domestic GPUs have made substantial breakthroughs in system-level engineering optimization for processing extremely large parameter models. At the same time, Moore Threads also shared its development plans for the MTT C256 super-node architecture aimed at the next-generation intelligent computing center, with the goal of further improving cluster efficiency through high-density hardware design. The compatibility of the software ecosystem has always been seen as the lifeblood of domestic GPUs. At the conference, Zhang Jianzhong announced that the company's self-developed MUSA architecture has undergone a full-stack software upgrade. The upgraded MUSA 5.0 has made breakthroughs in stack uniformity and performance. The core computing library muDNN's GEMM/FlashAttention efficiency exceeds 98%, and communication efficiency reaches 97%. Moore Threads also clarified its open-source plan and will gradually open core components of computing acceleration libraries, communication libraries, and system management frameworks to the developer community. In addition, the company plans to introduce the intermediate language MTX, compatible with cross-generational GPU instruction architectures, as well as the programming language muLang for rendering and AI fusion computing to lower the adaptation threshold for developers. At this conference, Moore Threads made a move that surprised the market by officially entering the field of personal intelligent computing terminal hardware. Zhang Jianzhong introduced the company's first AI computing power notebook, the MTT AIBOOK, priced at 9999 yuan (32GB+1TB version), which is expected to go on sale on January 10, 2026. This notebook is equipped with Moore Threads' self-developed intelligent SoC chip "Changjiang," which integrates high-performance all-core CPU and full-featured GPU, with heterogeneous AI computing power of 50TOPS. From a product logic perspective, the MTT AIBOOK is more like an "out-of-the-box" tool provided by Moore Threads for 200,000 developers. The device is equipped with an AI intelligent body and 2DDIGIHUMAN "Xiaomai," supporting the generation of DIGIHUMAN images within 0.5 seconds and pre-installed with the Qwen3-8B large model. By supporting Windows, Linux, Android containers, and all domestic operating systems, Moore Threads is attempting to bring its MUSA ecosystem from the cloud center to developers' desktops, closing the loop on code debugging and application development. Academician of the Chinese Academy of Engineering Zheng Weimin pointed out at the conference that the development of "sovereign AI" lies in autonomous computing power, algorithm self-enhancement, and ecological autonomy. He believes that although building a domestic hundred-thousand-card-level intelligent computing system poses significant challenges, it is a necessary task for industrial infrastructure. Zheng emphasized that the domestic chip platform must establish a user-friendly development environment to truly retain the developer community. In terms of capital market performance, as the "first domestic GPU stock," Moore Threads has shown significant volatility in its stock price recently. Affected by multiple factors, the company's stock price closed at 664.10 yuan/share on December 19, a decrease of 5.9%. Compared to the high point on December 11, the cumulative decline in its stock price has reached 29.4%. However, when measured against the issue price, the current increase is still over 481%, and the company's total market value remains at a high of 312.146 billion yuan. Currently, the global computing power market is in a transition from simply pursuing parameter scale to pursuing inference efficiency and ecological implementation. Through the "Huagang" architecture and the "core-edge-end-cloud" full-stack system demonstrated at this conference, Moore Threads reflects its strategic intention to move from a single hardware supplier to a platform-level computing power infrastructure provider. The test data on the efficiency of the ten-thousand-card cluster and the DeepSeek model inference enhances its footing in the capital-intensive computing marathon. This article is from "CaiLian News," by Wang Biwei. GMTEight editorial by Chen Qiuda.