CICC: DeepSeek technology breakthroughs, cost reduction leads to diverse applications.
11/02/2025
GMT Eight
CICC released a research report stating that DeepSeek will be launched globally by the end of January 2025, with a daily active users (DAU) of 22.15 million on the app side, ranking only behind ChatGPT in the total daily active users of AI products, and ranking first in terms of downloads in the Apple App Store in 157 countries and regions. The technical innovation and outstanding engineering capabilities behind DeepSeek's launch have led the global technology trends and are expected to further promote the popularization and scaling of AI applications, laying the foundation for cost reduction in end-to-end cloud applications. It is suggested to pay attention to investment opportunities in application layer under the background of domestic model performance optimization in 2025.
The main points of CICC are as follows:
DeepSeek V3 achieves leading cost-effectiveness through technological innovation and engineering optimization.
It adopts the self-developed MoE architecture with a total parameter size of 671 billion and 37 billion parameters activated per token, benchmarking against the GPT-4o in multiple dimensions. Technological breakthroughs include sparse expert model MoE, multi-head attention mechanism MLA, and innovative training target MTP, significantly improving inference efficiency. In addition, the FP8 mixed-precision training strategy is widely applied for the first time, balancing stability and cost-effectiveness, with a training cost of only 5.57 million US dollars in less than two months. The API pricing of V3 is as low as 0.5 yuan per million input tokens, greatly reducing the cost of use and promoting the widespread application of large models on the terminal side.
DeepSeek R1 series achieves marginal breakthrough in reasoning ability through reinforcement learning (RL).
R1 Zero skips the traditional large-scale supervised fine-tuning (SFT) process and directly trains the base model through reinforcement learning, achieving capabilities comparable to OpenAI o1, verifying the potential application of RL in large language models. Building on R1 zero, R1 further optimizes algorithms to address issues such as language consistency. By optimizing Nvidia's PTX instruction set at the low level, the R1 series improves cross-platform compatibility and enables compatibility with domestic chips. The efficient inference and low cost of R1 release its potential in industrial applications, further promoting the popularization and scaling of AI applications.
DeepSeek Janus-Pro model excels in image understanding and generation, achieving architectural unity.
Janus-Pro uses two encoders to handle image understanding and generation separately, sharing a Transformer network, and adopts a three-stage training optimization to improve the model's adaptability to real scenes, with better results than overseas achievements such as Dalle 3.
DeepSeek will bring three dimensions of industry impact:
1) Data will shift from "scale-driven" to "quality-first";
2) Distillation technology will drive lightweight models to meet high performance, high efficiency, and further promote large-scale end-side deployments;
3) Both domestic and foreign giants will follow suit, potentially ushering in technological equality, with engineering capabilities and ecosystem construction still being key factors for enterprises to build competitive barriers.
Risk Warning: Technological iterations may not meet expectations, and downstream commercialization may not meet expectations.