GF Securities: MRDIMM and CXL increase AI server memory. It is recommended to pay attention to core beneficiaries in the industry chain.
MRDIMM and CXL form a layered synergy of "high bandwidth at the near end + large capacity at the far end" to increase AI server memory supply and elastic expansion at a lower TCO.
GF SEC released a research report stating that in high-concurrency, long-context intensive reasoning scenarios, MRDIMM and CXL form a "near-end high bandwidth + far-end large capacity" layered collaboration to increase AI server memory supply and elastic expansion with lower TCO. CXL 3.1 shows significant performance improvements in KVCache, particularly suitable for high-concurrency and ultra-long context loads. It is recommended to pay attention to core beneficiaries in the industry chain.
Key points from GF SEC:
MRDIMM and CXL increase AI server memory
How to simultaneously improve and balance the memory capacity expansion and performance of AI servers has become a core challenge in current architecture evolution. There are currently three challenges: (1) high cost and limited capacity of HBM; (2) significant differences in memory requirements for different applications and workloads, needing to avoid shortage or overprovisioning of memory; (3) the memory expandable capacity of CPU slots becomes a bottleneck, requiring new storage and architectures for expansion. MRDIMM and CXL form a "near-end high bandwidth + far-end large capacity" layered collaboration to increase AI server memory supply and elastic expansion with lower TCO.
MRDIMM as the near-end memory base for KVCache intensive reasoning
MRDIMM provides deterministic gains in the KVCache scene of large model reasoning: higher concurrency, longer contexts, lower end-to-end latency, and significantly optimized CPU-GPU memory layout and resource utilization. Specifically: (1) Bandwidth increase: according to the report "Next-Gen AI App Server Performance with CXL3.1 Tier Memory and MRDIMMGen2 solution," MRDIMMGen2 supports a maximum speed of 12800 MT/s, which can improve bandwidth by 2.3 times compared to DDR5 RDIMM under AI load; faster memory access significantly reduces KVCache read/write latency, supporting high throughput reasoning; (2) Capacity expansion: a single module supports 64/96/128GB capacity, enabling longer contexts and more parallel sessions; (3) Decoupled memory design: MRDIMM's high bandwidth and large capacity are suited for CPU-side KVCache offloading. Intel Xeon 6 "Granite Rapids" with a 12-channel memory controller can fully unleash the bandwidth potential of MRDIMM, effectively alleviating GPU memory pressure, and facilitating multi-session scheduling and cross-session KVCache reuse to achieve a balance between latency and cost.
CXL provides far-end/pool extension and significant TCO advantages in KVCache intensive reasoning
CXL 3.1 shows significant performance improvements in KVCache, particularly suitable for high-concurrency and ultra-long context loads. Specifically: (1) Memory pooling and expansion: pool memory between CPU/GPU/accelerators, offloading some KVCache from expensive GPU memory elastically to CXL devices, expanding effective capacity to the level of terabytes without increasing GPU costs; (2) Low-latency access: CXL access latency can approach CPU DRAM, enabling KVCache placed in CXL to maintain near real-time decoding performance even under heavy loads; (3) Decoupled KVCache architecture: in the Bytedance LLM service stack, offloading KVCache to CXL can increase batch size by 30%, reduce GPU requirements by 87%, and increase GPU utilization during the prefill phase by 7.5 times; (4) Layered memory management: CXL supports hot and cold tiering, allowing dynamic placement of KVCache based on access frequency. Hot KV stays in GPU/CPU DRAM, while warm/cold data migrates to the CXL pool. For example, using the DeepSeek-1.73B quantification model, a single-CPU route (CPU0+128GB) + CXL extension of 128GB performs similarly to a dual-CPU route (CPU0/CPU1 each with 128GB) in terms of prompting and decoding throughput, but with fewer processors, resulting in a significant TCO advantage.
Risk warning
AI industry development and demand fall below expectations; AI server shipment volume falls below expectations, and domestic manufacturers' technological and product progress is below expectations.
Related Articles

Uber Technologies, Inc. (UBER.US) teamed up with Lucid (LCID.US) and Nuro to announce the launch of driverless taxis in San Francisco next year.
.png)
Qingdao Port (06198) appoints Su Jianguang as Chairman.

Shandong Gold Mining (01787) announced its performance for the first three quarters, with a net profit attributable to the parent of approximately 3.956 billion yuan, a year-on-year increase of 91.51%.
Uber Technologies, Inc. (UBER.US) teamed up with Lucid (LCID.US) and Nuro to announce the launch of driverless taxis in San Francisco next year.

Qingdao Port (06198) appoints Su Jianguang as Chairman.
.png)
Shandong Gold Mining (01787) announced its performance for the first three quarters, with a net profit attributable to the parent of approximately 3.956 billion yuan, a year-on-year increase of 91.51%.

RECOMMEND

The Capital Conundrum of Dongpeng Beverage: Distributing ¥5.4 Billion in Profits While Raising Capital in Hong Kong with Over ¥10 Billion Cash on Hand | IPO Watch
29/10/2025

Humanoid Robot Theme Maintains Momentum — Cathie Wood Backs It as One of AI’s Largest Opportunities
29/10/2025

Totaling $550 Billion: Japan’s U.S. Investment Project List Revealed, Largest Projects Near $100 Billion, U.S. Stocks in Related Sectors Rally
29/10/2025


