Orient: CXL scheme optimizes AI storage architecture, leading manufacturers expected to accelerate applications.

date
09:35 18/03/2026
avatar
GMT Eight
The industry believes that the software and hardware related to CXL memory pooling have gradually improved, and leading companies are speeding up their layout.
Orient released a research report stating that the CXL memory pooling solution can significantly optimize storage system efficiency, and the memory hardware composition in future AI computing facilities is expected to be reshaped. The company believes that the software and hardware related to CXL memory pooling have gradually improved, and top-tier companies are accelerating their layout. Major manufacturers continue to promote the optimization of AI inference efficiency through memory pooling solutions, and the CXL memory pooling solution is expected to continue to expand with the increasing demand for AI inference, benefiting the industry chain. The CXL solution optimizes AI storage architecture, and top-tier manufacturers are expected to accelerate its application. Orient's main points are as follows: The CXL solution can optimize storage system efficiency, and the future AI storage architecture is expected to be reshaped. Some investors still underestimate the demand for AI inference for memory capacity expansion and storage architecture optimization. The company believes that the CXL memory pooling solution can significantly optimize storage system efficiency, and the memory hardware composition in future AI computing facilities is expected to be reshaped. The CXL (Compute Express Link) memory pooling solution supports unified addressing, scheduling, and transparent access to memory resources across CPUs, GPUs, and other computing accelerators, integrating and unifying memory resources to support larger scale and higher concurrency for large model training and inference tasks. In terms of capacity requirements, the storage needs for AI inference, such as context caching and model weights, are constantly increasing, while current server memory upgrades are constrained by slot numbers and single memory stick capacities. In addition, there are inefficiencies in the current storage architecture, as model parameters and activation values need to be frequently transferred from HBM to DRAM and then to SSD, leading to latency amplification, wasted bandwidth, and decreased throughput due to significant bandwidth differences and lack of a unified memory semantic underlying direct connection protocol. Different tasks have significant differences in the emphasis on computational resources and memory resources, and existing static resource allocation solutions may lead to computational waste or memory bottleneck issues. With the current problems in storage architecture in mind, the CXL memory pooling solution is expected to expand the memory space for AI computing facilities and provide more flexible resource allocation solutions, thereby enhancing AI model training and inference capabilities. At the same time, through optimized memory configuration, CXL technology is expected to significantly reduce the total cost of ownership (TCO) for data center systems. The CXL solution's related hardware and software are gradually improving, and top-tier manufacturers are advancing its application. Some investors believe that the maturity of memory pooling solutions is limited. The company believes that the software and hardware related to CXL memory pooling have gradually improved, and top-tier companies are accelerating their layout. In terms of interconnect protocols, the CXL 4.0 specification was released in November 2025, with data rates reaching 128 GT/s, doubling that of CXL 3.0. NVIDIA is expected to continue to drive the layout of CXL technology in the future, acquiring the core team and technology license for Enfabrica in September 2025; according to information on the NVIDIA website, the NVIDIA Vera CPU supports the CXL protocol. Domestic server manufacturers have launched CXL memory pooling related solutions, with Alibaba Cloud announcing the world's first PolarDB database dedicated server based on CXL 2.0 Switch technology at the Cloud Summit in 2025; Inspur Electronic Information Industry launched the MetaBrain server CXL memory expansion solution in December 2025, based on the MetaBrain server NF5280G7, with a basic configuration of 24 local DRAM memory bars and built-in CXL memory expansion cards. With the gradual improvement of related software and hardware and the advancement of applications by top-tier manufacturers, the penetration rate of CXL technology is expected to rapidly increase. According to Techinsight's forecast, the total share of CXL in server DRAM will grow from nearly zero in 2024 to about 15% in 2030. Supporting CXL functionality is expected to gradually become standard in servers, and the industry ecosystem is expected to mature faster. The CXL solution continues to innovate to further adapt to AI inference demands. Major manufacturers are constantly advancing innovation in memory pooling solutions to adapt to AI inference demands. In March 2026, Inspur's MetaBrain server operating system KOS launched the "store transmit integrated" KVCache management system MantaKV based on CXL memory pooling technology, storing large amounts of KVCache generated by P nodes in the shared memory pool of CXL for decoding usage by D nodes (without the need for retransmission), making it a globally available persistent cache (without unloading to the local SSD of P nodes), consolidating two independent transfers into a single write to solve transmission redundancy issues and improve model inference efficiency. In March 2026, Peking University in collaboration with Alibaba Cloud and others first proposed using CXL memory pooling to store Engrams, integrating CXL-based Engram memory pools into the SGLang framework, achieving close to local DRAM end-to-end performance, providing a scalable and cost-effective storage solution for integrating Engrams into large language models. The company believes that major manufacturers continue to promote the optimization of AI inference efficiency through memory pooling solutions, and the CXL memory pooling solution is expected to continue to expand with the increasing demand for AI inference, benefiting the industry chain. Risk Warning AI deployment falls short of expectations, technology iteration speeds are lower than expected, and localization progress is slower than expected.