Open source securities: Domestic Scale-up/Scale-out hardware commercialization speeds up, focusing on AI power industry investment opportunities.

date
15:20 15/10/2025
avatar
GMT Eight
Computational power, storage power, and processing power work together to drive the hardware capabilities of AI. Domestic processing power may become the focus of current development.
Open Source Securities released a research report stating that the traditional computing power architecture is no longer able to meet the efficient, low-cost, and large-scale collaborative AI training demands. Super nodes have become a trend, which significantly boosts hardware demands related to Scale up by improving the computing power of individual nodes. On the other hand, the construction of super-large AI clusters requires horizontal promotion of interconnection between numerous nodes, driving hardware demands related to Scale out. As larger clusters emerge, the bottleneck of power resources in a single region becomes apparent, leading to the gradual adoption of Scale across solutions across data centers in the future. Different communication protocols exist for Scale up and Scale out, with major companies developing proprietary protocols while smaller companies push for public protocols, which are expected to be the future trend. The main viewpoints of Open Source Securities are as follows: AI hardware capabilities are driven by computing power, storage power, and operational power, with domestic operations possibly becoming a focus of development. AI hardware capabilities lie in three points: (1) computing power determined by GPU performance and quantity. (2) Storage power includes a large amount of repetitive KV matrix storage and access with high-bandwidth HBM caches being the mainstream solution. (3) Operational power divides into Scale up, Scale out, and Scale across scenarios, corresponding to high-speed communication and data transfer capacities within nodes, between nodes, and between data centers. With the improvement of GPU computing power and HBM bandwidth, the bottleneck of operational power may lead to high vacancy rates in AI data center nodes and waste of GPU performance. The development of operational power will enhance the overall efficiency of AI data centers and become a key factor in boosting their computational capabilities. Currently, domestic computing power manufacturers are developing rapidly, while Huawei and Changxin are making progress in storage power with HBM. Therefore, the development of operational power will become the next focus of domestic challenges. Super nodes and large clusters are driving the rapid expansion of the operational power market, with public and private protocols playing important roles. The traditional computing power architecture is no longer able to meet the efficient, low-cost, and large-scale collaborative AI training demands, with super nodes becoming a trend. This significantly boosts hardware demands related to Scale up. According to Lightcounting, Scale up switching chips have become the main demand in data centers, with sustained growth. It is estimated that by 2030, the global market size will reach nearly 18 billion USD, with a CAGR of about 28% from 2022-2030. On the other hand, the construction of super-large AI clusters requires the horizontal promotion of interconnection between numerous nodes, driving hardware demands related to Scale out. As larger clusters emerge, the bottleneck of power resources in a single region is evident, leading to the gradual adoption of Scale across solutions in the future. Different communication protocols exist for Scale up and Scale out. Major companies are developing proprietary protocols while smaller companies are promoting public protocolsthis is expected to be the future trend. Specifically, at the Scale up level, NVIDIA NVlink, AMD Infinity Fabric (Ualink), and Huawei UB mesh represent proprietary protocols, while Broadcom's SUE and the industry-standard PCIe represent public protocols. At the Scale out level, NVIDIA Infiniband is proprietary, and Broadcom is working on promoting RoCE2 based on public Ethernet. Additionally, many overseas manufacturers are collectively advancing the Super Ethernet Alliance, becoming a new force in Scale out. The domestic rate of operational hardware is extremely low, potentially becoming the next high-growth area for domestic substitutes. Operational hardware mainly involves switch chips and mixed signal quality improvement chips, with a very low rate of domestic self-sufficiency. For example, Broadcom and Marvell dominate over 90% of the global commercial switch chip market. Currently, many domestic manufacturers have completed product production, gradually moving towards commercialization. For example, Shudu Technology's self-designed PCIe 5.0 switch chips are already in mass production and being applied by customers. Furthermore, Shengketong's Arctic series for large-scale data centers and cloud services started customer sampling tests by the end of 2023. Therefore, companies related to operational hardware are moving fast from productization to commercialization, and the vast space for domestic substitutes indicates potential gradual benefits. Investment Recommendations: Beneficiaries of PCIe hardware: Vantone Neo Development Group (600246.SH) (Shudu Technology), Montage Technology (688008.SH) and beneficiaries of Ethernet hardware: Suzhou Centec Communications (688702.SH), ZTE Corporation (000063.SZ), Motorcomm Electronic Technology (688515.SH). Risk Warning: AI data center construction does not meet expectations, and product iteration research and development fall short of expectations.