GF Securities: MRDIMM and CXL increase AI server memory. It is recommended to pay attention to core beneficiaries in the industry chain.

date
10:17 29/10/2025
avatar
GMT Eight
MRDIMM and CXL form a layered synergy of "high bandwidth at the near end + large capacity at the far end" to increase AI server memory supply and elastic expansion at a lower TCO.
GF SEC released a research report stating that in high-concurrency, long-context intensive reasoning scenarios, MRDIMM and CXL form a "near-end high bandwidth + far-end large capacity" layered collaboration to increase AI server memory supply and elastic expansion with lower TCO. CXL 3.1 shows significant performance improvements in KVCache, particularly suitable for high-concurrency and ultra-long context loads. It is recommended to pay attention to core beneficiaries in the industry chain. Key points from GF SEC: MRDIMM and CXL increase AI server memory How to simultaneously improve and balance the memory capacity expansion and performance of AI servers has become a core challenge in current architecture evolution. There are currently three challenges: (1) high cost and limited capacity of HBM; (2) significant differences in memory requirements for different applications and workloads, needing to avoid shortage or overprovisioning of memory; (3) the memory expandable capacity of CPU slots becomes a bottleneck, requiring new storage and architectures for expansion. MRDIMM and CXL form a "near-end high bandwidth + far-end large capacity" layered collaboration to increase AI server memory supply and elastic expansion with lower TCO. MRDIMM as the near-end memory base for KVCache intensive reasoning MRDIMM provides deterministic gains in the KVCache scene of large model reasoning: higher concurrency, longer contexts, lower end-to-end latency, and significantly optimized CPU-GPU memory layout and resource utilization. Specifically: (1) Bandwidth increase: according to the report "Next-Gen AI App Server Performance with CXL3.1 Tier Memory and MRDIMMGen2 solution," MRDIMMGen2 supports a maximum speed of 12800 MT/s, which can improve bandwidth by 2.3 times compared to DDR5 RDIMM under AI load; faster memory access significantly reduces KVCache read/write latency, supporting high throughput reasoning; (2) Capacity expansion: a single module supports 64/96/128GB capacity, enabling longer contexts and more parallel sessions; (3) Decoupled memory design: MRDIMM's high bandwidth and large capacity are suited for CPU-side KVCache offloading. Intel Xeon 6 "Granite Rapids" with a 12-channel memory controller can fully unleash the bandwidth potential of MRDIMM, effectively alleviating GPU memory pressure, and facilitating multi-session scheduling and cross-session KVCache reuse to achieve a balance between latency and cost. CXL provides far-end/pool extension and significant TCO advantages in KVCache intensive reasoning CXL 3.1 shows significant performance improvements in KVCache, particularly suitable for high-concurrency and ultra-long context loads. Specifically: (1) Memory pooling and expansion: pool memory between CPU/GPU/accelerators, offloading some KVCache from expensive GPU memory elastically to CXL devices, expanding effective capacity to the level of terabytes without increasing GPU costs; (2) Low-latency access: CXL access latency can approach CPU DRAM, enabling KVCache placed in CXL to maintain near real-time decoding performance even under heavy loads; (3) Decoupled KVCache architecture: in the Bytedance LLM service stack, offloading KVCache to CXL can increase batch size by 30%, reduce GPU requirements by 87%, and increase GPU utilization during the prefill phase by 7.5 times; (4) Layered memory management: CXL supports hot and cold tiering, allowing dynamic placement of KVCache based on access frequency. Hot KV stays in GPU/CPU DRAM, while warm/cold data migrates to the CXL pool. For example, using the DeepSeek-1.73B quantification model, a single-CPU route (CPU0+128GB) + CXL extension of 128GB performs similarly to a dual-CPU route (CPU0/CPU1 each with 128GB) in terms of prompting and decoding throughput, but with fewer processors, resulting in a significant TCO advantage. Risk warning AI industry development and demand fall below expectations; AI server shipment volume falls below expectations, and domestic manufacturers' technological and product progress is below expectations.