Caitong: Scaling Law suggests focusing on the AI foundation during the training and inference transfer phases.

date
22/02/2025
avatar
GMT Eight
Caitong released a research report stating that Scaling Law, as a product of experimental science, is currently facing the challenge of data resource depletion, and the Transformer architecture has not been able to fully express the thinking mechanism of the human brain. The firm believes that the correct understanding of Scaling Law is needed, as it involves natural decay, high variance in expectations of LLM capabilities, and time needed for large-scale interdisciplinary engineering attempts. Through continuous exploration of new paradigms, Scaling Law is moving towards backward training and inference stages. The firm mentioned that DeepSeek-R1-Zero has achieved breakthrough innovation in its technical roadmap, pointing out one of the key directions of LLM Scaling Law, which is high-quality datasets and powerful baseline models. In terms of investment, the firm suggests focusing on AI infrastructure-related sectors such as CPUs/GPUs and servers, IDC manufacturers, AI server liquid cooling, and power supply. Caitong's main points are as follows: - Scaling Law: A product of experimental science. Scaling Law refers to the fact that the ultimate performance of large language models (LLMs) is mainly related to the scale of computing power, model parameters, and training data, and is basically independent of the specific structure of the model (such as layers, depth, or width). With the increase of model size, data volume, and computing resources, the performance of the model will improve correspondingly, and this improvement follows a power law relationship. Scaling Law is a product of experimental science. In November 2022, the debut of ChatGPT shocked the industry, marking a significant breakthrough in large models, and Scaling Law became the core guiding principle for further scaling up models. The industry generally believes that as long as larger parameters, more data, and more powerful computing capabilities can be provided, the model's ability will continue to improve, potentially approaching or achieving Artificial General Intelligence (AGI). - Challenges facing Scaling Law: High-quality data and algorithm innovation. Current data resources are facing the challenge of depletion, and the Transformer architecture has not been able to fully express the thinking mechanism of the human brain. The limitations of current AI not only lie in insufficient data but also in its low learning efficiency. True intelligence is not just about accumulating data but also about compressing and refining information, similar to acquiring deeper intelligence by summarizing first principles. - Exploration of new paradigms for Scaling Law, shifting to backward training and inference stages: The "slowdown" currently experienced by the industry is an expected part of LLM Scaling Law. The firm believes that the correct understanding of Scaling Law is needed, as it involves natural decay, high variance in expectations of LLM capabilities, and time needed for large-scale interdisciplinary engineering attempts. Through continuous exploration of new paradigms, Scaling Law is shifting towards backward training and inference stages, with research finding that Reinforcement Learning (RL) and Test Time also exhibit Scaling Law. - DeepSeek-R1-Zero has achieved breakthrough innovation in its technical roadmap, becoming the first large language model to completely abandon the supervised fine-tuning step and rely entirely on reinforcement learning training, demonstrating the enormous potential of unsupervised or weakly supervised learning methods in enhancing model inference capabilities. High-performance AI inference models such as s1-32B developed by the team led by Fei-Fei Li at a training cost of less than $50 and research like LIMO proposed by the team at Shanghai Jiao Tong University not only reveal the possibilities of efficient, low-cost development paths but also point out one of the key directions of LLM Scaling Law, namely high-quality datasets and powerful baseline models. Within this framework, the focus of research is shifting from simply pursuing larger datasets and computing resources to optimizing data quality and uncovering the potential of existing models. What will we scale next? Any new paradigm will eventually reach a bottleneck or marginal slowdown, so the current direction should be to exhaust existing scaling directions before the bottleneck arrives and simultaneously seek new scaling law segments. To quote Cameron R. Wolfe, PhD, "Whether scaling will continue is not a question. The true question is what we will scale next." Investment recommendations: - Focus on AI infrastructure sectors: CPUs/GPUs and servers: Hygon Information Technology (688041.SH), Sharetronic Data Technology (300857.SZ), Hygon Information Technology (688256.SH), Inspur Electronic Information Industry (000977.SZ), Dawning Information Industry (603019.SH), Digital China Group (000034.SZ), Unisplendour Corporation (000938.SZ), etc. IDC manufacturers: Wangguo Data (09698), Shanghai AtHub (603881.SH), Guangdong Aofei Data Technology (300738.SZ), Range Intelligent Computing Technology Group (300442.SZ), etc. AI server liquid cooling and power supply: Liquid cooling focus on Shenzhen Envicool Technology (002837.SZ), Guangzhou Goaland Energy Conservation Tech (300499.SZ), Guangdong Shenling Environmental Systems (301018.SZ), etc.; power supply focus on Shenzhen Honor Electronic (300870.SZ), Shenzhen Megmeet Electrical (002851.SZ), China Greatwall Technology Group (000066.SZ), etc.Risk warning: Risks of technology iteration falling short of expectations; risks of commercialization not meeting expectations; risks of policy support falling short of expectations; risks of global macroeconomic risks.

Contact: contact@gmteight.com