Soochow: Full process domestic computing power training for large models, focusing on adapting to future prospects.

date
14:05 30/04/2026
avatar
GMT Eight
Regardless of how DeepSeek V4 performs, its strategic significance is crucial, and the focus should be on the prospects of training and adapting domestic computing power.
Soochow released a research report stating that DeepSeek V4 is an attempt at using domestic computing power for training large models from scratch. Previously, domestic large models used domestic computing power mainly for inference, but with DeepSeek V4, there is now a presence of domestic computing power in the training architecture as well as the entire inference process. This is a significant milestone. Therefore, regardless of how DeepSeek V4 performs, its strategic significance is very important, and the focus should be on the prospects for the training adaptation of domestic computing power. Key points highlighted by Soochow are as follows: DeepSeek V4 is the first to involve Huawei's Ascend chips in training. DeepSeek V4 Flash is the first publicized universal large model to describe the use of domestic computing power for training, achieving a technical layout that moves away from dependence on NVIDIA. This is achieved through three core designs: (1) the introduction of MXFP4 quantization-aware training for MoE expert weights and the indexer QK path, reducing the reliance on NVIDIA's FP8 ecosystem and making it compatible with domestic chips such as Huawei's Ascend and Cambricon; (2) the adoption of the TileLang domain-specific language for developing underlying operators, freeing it from the tight integration with CUDA ecosystem to enable cross-hardware platform compilation and reduce migration costs to domestic chips; (3) the self-developed MegaMoE2 fusion kernel, which achieves expert parallel fine-grained communication computation overlap and has been successfully adapted and executed on the Huawei Ascend platform, addressing the communication bottleneck in MoE models in a domestic hardware environment. Performance: It ranks among the top global players, with many core indicators on par with or surpassing top international closed-source models. In terms of knowledge reservoir, DeepSeek-V4-Pro-Max achieved a score of 57.9 on the SimpleQA-Verified benchmark, leading other mainstream open-source models by a large margin. The Chinese SimpleQA score reached 84.4, significantly narrowing the gap with Gemini-3.1-Pro. It also outperformed in education knowledge benchmarks such as MMLU-Pro and GPQADiamond. In terms of inference and code capabilities, the Pro-Max version scored 3206 on Codeforces, ranking 23rd on the human player leaderboard. The Flash version also scored 3052 on Codeforces, matching the inference performance of GPT-5.2 and other closed-source models. Its Agent capability also stands out with high scores across various benchmarks. In long-context scenarios with 1M tokens, it scored 83.5 in MRCR and 62.0 in CorpusQA, surpassing Gemini-3.1-Pro, and maintaining stable retrieval capability with 128K context. Model architecture: CSA+HCA+mHC further compresses inference costs. The introduction of a hybrid attention architecture alternating between CSA and HCA, combined with layered KV caching compression and sparse attention, reduces the FLOPs for single-token inference to 27% of V3.2 in a 1M token context. It also addresses the issue of long-context computing bottleneck. The mHC manifold-constrained hyperconnectivity upgrades the traditional residual structure, improving signal propagation stability and expressive capacity for deep models. Domain experts trained independently and post-training with full word list online distillation mitigates performance degradation issues from multiple capability fusion. Risk factors include potential delays in the iteration pace of large models, slower-than-expected progress in the adaptation of domestic computing power in both hardware and software ecosystems, increased market competition in the large models industry, and stricter regulatory policies in the industry.