DeepSeek is always full of excitement on the weekends!

date
02/03/2025
avatar
GMT Eight
On March 1st, DeepSeek released two consecutive pieces of groundbreaking news: 1. DeepSeek's first disclosure: the theoretical profit margin of its V3/R1 reasoning system is as high as 545%, demonstrating significant technological and cost advantages. 2. Luchen Technology announced the suspension of DeepSeek API services: users are required to use up their account balance as soon as possible, with any unused portions being fully refunded. The news reveals the dual challenges DeepSeek faces between technological innovation and business models. DeepSeek's first disclosure of theoretical profit margin On March 1st, DeepSeek launched an official account on the Zhihu platform and published a technical article titled "Overview of the DeepSeek-V3/R1 Inference System". The article shows that the optimization goal of the DeepSeek-V3/R1 inference system is to achieve greater throughput and lower latency. To achieve these two goals, DeepSeek adopted a large-scale cross-node expert parallel (EP) solution. While this solution increases the complexity of the system, it effectively improves performance. In the article, DeepSeek detailed how EP technology is used to increase batch size, conceal transmission time, and perform load balancing to optimize the performance of the entire inference system. Through these technical means, DeepSeek can handle larger amounts of data while maintaining or improving response times. The diagram below shows the calculation-communication overlap during the prefilling stage: The decoding stage also employs a similar strategy but is more detailed, further dividing the Attention layer into two steps and using a five-stage pipeline to achieve smoother communication-computation overlap. In terms of cost control, DeepSeek's performance is particularly outstanding. According to the official data disclosed, the training cost of V3/R1 is only 5.576 million US dollars, less than 1/20 of the OpenAI GPT-4o model. Its inference cost is also highly competitive, priced at only 16 yuan per million tokens, nearly 70% lower compared to OpenAI. Based on these optimizations, DeepSeek's theoretical profit margin reaches 545%, demonstrating its enormous potential in cost control and efficiency improvement. Luchen Technology suspends DeepSeek API services Despite DeepSeek's significant technological breakthroughs, its MaaS-based business model faces serious challenges. On March 1st, the official WeChat account of Luchen Technology announced, "Dear users, Luchen Cloud will stop providing DeepSeek API services in one week, please use up your balance as soon as possible. If not fully used, we will provide a full refund." On the same afternoon, Luchen Technology CEO You Yang responded to DeepSeek's disclosure of the theoretical profit margin. The main points of controversy focused on: 1. Data reference issue: You Yang believes that the article combines the token numbers of DeepSeek's web pages, apps, and MaaS APIs, which cannot accurately reflect the true costs and usage of MaaS. MaaS is a tool for ToB, and there are significant differences in performance requirements and cost structures compared to ToC web pages and apps. 2. MaaS Performance and Stability: During the Spring Festival period, DeepSeek apps and web pages frequently crashed, with delays exceeding 15 minutes, failing to meet the requirements of low latency and high stability for ToB customers. As a ToB service, MaaS needs to ensure a first word response time of 2 seconds and single token handling of 100ms, which the current technology struggles to achieve. 3. Sustainability of the business model: MaaS needs to operate at full capacity at all times and prepare 5 times the actual demand in machine resources to handle sudden traffic surges, leading to high costs. The contradiction between high investment and low profit margins makes it difficult for the MaaS model to be profitable. 4. Actual contribution of technological innovation: DeepSeek relies on NVIDIA GPUs and existing technologies (such as the MOE architecture and PD separation) in AI infrastructure, without proposing disruptive algorithms. Small and medium-sized cloud vendors boast of a 10x increase in inference speed but lack practical profit cases, making it challenging for technological advantages to business success. Source: WeChat public account "Wind Wendee", GMTEight Editor: Chen Qiuda.

Contact: contact@gmteight.com