Caitong: DeepSeek's open source Saturday combo maintain a positive rating for the computer industry
03/03/2025
GMT Eight
Caitong released a research report stating that it maintains a positive rating on the computer industry. During the 5-day open source week "technical bombing" from February 24th to 28th, DeepSeek released 5 major code repositories covering training, inference, communication, load balancing, and data acceleration. On the 6th day, a document was surprise released about the overview of the DeepSeek-V3/R1 inference system, demonstrating how high throughput, low latency, and cost-effective inference services can be achieved through cross-node parallelism, load balancing, and dynamic resource management to achieve a theoretical profit margin of 545%.
Key points from Caitong:
DeepSeek Open Source Week: Higher throughput, lower latency, and ultimate cost-effectiveness
During the 5-day open source week "technical bombing", DeepSeek released 5 major code repositories covering training, inference, communication, load balancing, and data acceleration. On the first day of the open source week, FlashMLA was released, an efficient MLA decoder designed for the Hopper architecture, which can efficiently process variable-length sequences, optimize memory management, and extract the ultimate performance from GPUs. On the second day, DeepEP was released, focusing on communication resource utilization to improve efficient data transfer, being the first flexible GPU resource control communication library customized for MoE. On the third day, DeepSeek introduced the FP8 computing library DeepGEMM that supports dense and MoE models, with core logic of only about 300 lines of code that directly addresses the most frequent matrix multiplication in AI computing, providing strong support for training and inference of V3/R1. On the fourth day, DeepSeek introduced a series of optimized parallel strategies, including DualPipe - a bidirectional pipeline parallel algorithm for computing and communication overlap in V3/R1 model training; and EPLB - an expert parallel load balancing tool for V3/R1 models, analyzing in-depth the computational and communication overlap mechanisms in V3/R1 models. On the last day of the open source week, DeepSeek released the 3FS parallel file system, aimed at addressing the challenges of AI training and inference workloads. The system uses modern solid-state drives (SSDs) and RDMA networks to provide a shared storage layer, simplifying the development of distributed applications, and speeding up all data access operations on the DeepSeek platform.
One More Thing: DeepSeek-V3/R1 inference system achieves a theoretical profit margin of 545% for large model inference
After the 5-day "technical bombing" of the open source week, on the 6th day, DeepSeek surprised released a document about the overview of the DeepSeek-V3/R1 inference system, demonstrating how high throughput, low latency, and cost-effective inference services can be achieved through cross-node parallelism, load balancing, and dynamic resource management to achieve a theoretical profit margin of 545%. DeepSeek introduced three load balancers: Prefill LoadBalancer, Decode Load Balancer, and Expert-Parallel Load Balancer. They are optimized for different core issues, aiming to allocate balanced computation and communication loads to each GPU, thereby improving the overall system efficiency.
DeepSeek's six consecutive open sources showcase their geeky style
The continuous technical releases during the DeepSeek open source week demonstrate the team's geek spirit and open source philosophy. As stated by their official website: "There is no ivory tower here, only the spirit of pure garage entrepreneurship and community-driven innovation." The FlashMLA project also includes a line of inline code, indicating that the DeepSeek team delves deep into the underlying machine code of the high-level programming language CUDA and GPU, providing more detailed control over GPU parallel computing, memory access, etc., and further enhancing the program's performance. This not only reflects the team's deep cultivation of algorithms but also demonstrates their ultimate pursuit of efficiency in engineering.
Risk warning: Risks of technological iteration falling short of expectations; risks of commercialization falling short of expectations; risks of policy support falling short of expectations; risks of global macroeconomic uncertainties.