Debon Securities: Model Distillation Technology Explodes This Year, Accelerating the Equality of AI

date
07/02/2025
avatar
GMT Eight
Dengbang Securities released a research report stating that from DeepSeekR1 to s1, it continuously proves that 2025 will be the starting point for the widespread use of large models. The application of AI on the edge may see a comprehensive strengthening of AI cost reduction and model capability improvement at the same time; furthermore, with the support of distillation technology, the Jevons paradox may be continuously verified, leading to the emergence of more phenomenon-level high-cost-effective small models. While deploying on the edge and applications, it will promote models to gradually transition from pre-training to inference, and domestic computing power is expected to undergo value reassessment under the outbreak of inference computing power. Dengbang Securities' main points are as follows: Costing only $50, performance comparable to o1 and R1, the explosion year of model distillation technology. According to TechCrunch, a new research paper by AI researchers from Stanford University, the University of Washington, and others such as Li Feifei shows that they successfully trained an AI reasoning model s1 for less than $50 (only the cost of cloud computing services, not including server, GPU, and other hardware investment costs). 1) Technological path: The paper points out that reasoning models can be distilled through a process of using relatively small datasets and supervised fine-tuning (SFT), where AI models are explicitly guided to mimic certain behaviors in the dataset. Specifically, the team built an "s1K" dataset consisting of 1000 carefully selected questions, each accompanied by reasoning traces and answers distilled from Gemini Thinking Experimenta. The team then performed supervised fine-tuning (SFT) on a pre-trained model, training for 26 minutes using only 16 H100 GPUs. Additionally, to improve answer accuracy, the research team also used a "budget forcing" technique to control test time calculation, by forcing the model's thought process to terminate prematurely, or adding multiple "wait" instructions during s1 reasoning to extend thinking time, optimizing performance. 2) Test results: According to the research team's test results, the s1-32B model performed 27% higher than o1-preview on competition math problems (MATH and AIME24); and the model's performance on AIME24 was nearly equivalent to the Gemini2.0 Thinking API, demonstrating the effectiveness of the distillation process. Low cost, open-source, and distillation will significantly reduce the threshold for AI model development, accelerating the process of AI democratization. According to Geek Park, as early as January 2025, DeepSeek released the official version of the reasoning model DeepSeek-R1, adopting the MIT license, simultaneously open-sourcing model weights, and allowing users to train other models using model outputs, through model distillation, etc. DeepSeek actively guides the use of R1 as a teacher model to distill a smaller but still powerful model, distilling 6 small models open-source to the community through the output of DeepSeek-R1, of which the 32B and 70B models achieved effects comparable to Open AI o1-mini in multiple abilities. By combining the data distilled from the Gemini Thinking Experimenta model by the Li Feifei team through ultra-low-cost training, s1 has also achieved excellent model performance, confirming that distillation technology is an important means to promote model miniaturization and commercialization, potentially narrowing the performance gap between open-source and closed-source models, thereby accelerating the process of AI democratization; and laying the foundation for the explosion of AI applications on the edge. Investment recommendations: 1) Model distillation: Shenzhen Intelligent Precision Instrument (301512.SZ), TRS Information Technology (300229.SZ), SI-TECH Information Technology (300608.SZ), Dnake (300884.SZ), Beijing DeepGlint Technology (688207.SH), Beijing Ultrapower Software (300002.SZ), etc.; 2) AI applications: Beijing Kingsoft Office Software, Inc (688111.SH), Weaver Network Technology (603039.SH), Beijing Seeyon Internet Software Corp. (688369.SH), Bonree Data Technology (688229.SH), Geovis Technology Co., Ltd (688568.SH), KINGDEE INT'L (00268), Fujian Foxit Software Development Joint Stock (688095.SH), Richinfo Technology (300634.SZ), Wondershare Technology Group (300624.SZ), Easy Click Worldwide Network Technology (301171.SZ), Piesat Information Technology (688066.SH), etc.; 3) AI on the edge: Shenzhen Intellifusion Technologies (688343.SH), Olympic Circuit Technology (603920.SH), LENOVO GROUP (00992), Iflytek Co., Ltd. (002230.SZ), Espressif SyStems (688018.SH), Shenzhen Bluetrum Technology (688332.SH), etc.AI computing power: Shenzhen Intellifusion Technologies(688343.SH), Sichuan Huafeng Technology(688629.SH), Hygon Information Technology(688041.SH), Dawning Information Industry(603019.SH), Cambricon(688256.SH), Digital China Group(000034.SZ), Inspur Electronic Information Industry(000977.SZ), Range Intelligent Computing Technology Group(300442.SZ), RunJian Co.,Ltd.(002929.SZ), VNET Group, Inc. Sponsored ADR(VNET.US), etc. Risk warning: Unexpected upstream supply, slower than expected landing speed of downstream AI industry, intensified competition in the middle, international situation risk, domestic and foreign macro interest rate risk, etc.

Contact: contact@gmteight.com