NVIDIA Corporation (NVDA.US) Blackwell platform and ASIC chip upgrades are expected to boost liquid cooling penetration rate to over 20% by 2025.

date
23/09/2024
avatar
GMT Eight
According to the latest survey by TrendForce, the penetration rate of liquid cooling solutions is expected to grow significantly, from around 10% in 2024 to over 20% in 2025, with the launch of the NVIDIA Blackwell new platform expected to ship in the fourth quarter of 2024. With the increasing global awareness of ESG (environment, social, and corporate governance) and CSPs (cloud service providers) accelerating the construction of AI servers, it is expected to drive the transition of cooling solutions from air-cooled to liquid-cooled forms. In the global AI server market, NVIDIA (NVDA.US) remains the leading AI solution provider in 2024. In the GPU AI server market, NVIDIA holds a strong advantage with a market share of close to 90%, while the second-ranked AMD (AMD.US) has only about 8%. TrendForce observes that the shipment volume of NVIDIA Blackwell this year is relatively small mainly because the supply chain is still undergoing final product testing and verification processes, such as continuous optimization in high-speed transmission and cooling design. The new platform has high power consumption, especially the GB200 full-cabinet solution requiring better cooling efficiency, which is expected to promote the penetration rate of liquid cooling solutions. However, the proportion of liquid-cooled servers in the existing server ecosystem is still low, and ODMs need to undergo a learning curve to find the best solution for issues such as leakage and poor cooling efficiency. TrendForce estimates that by 2025, the share of Blackwell platform in high-end GPUs is expected to exceed 80%, which will drive power supply manufacturers and the cooling industry to compete in the AI liquid-cooled market, forming a new industrial competitive landscape. Large CSPs accelerate the deployment of AI servers, with Google actively deploying liquid cooling solutions In recent years, large American cloud service providers like Google, AWS, and Microsoft have been accelerating the deployment of AI servers, mainly using NVIDIA GPUs and self-developed ASICs. According to TrendForce, the thermal design power (TDP) of NVIDIA GB200 NVL72 rack cabinet reaches about 140kW, requiring a liquid cooling solution to solve the cooling problem, and it is expected to be mainly in the form of Liquid-to-Air (L2A) cooling. Other architectures of Blackwell servers such as HGX and MGX have lower density and mainly adopt air-cooled cooling solutions. In terms of self-developed AI ASICs by cloud service providers, Google's TPU not only uses air-cooled solutions, but also actively deploys liquid cooling solutions and is the most proactive American manufacturer to adopt liquid cooling solutions, with BOYD and Cooler Master being its main suppliers of cold plates. In Mainland China, Alibaba is the most active in expanding liquid-cooled data centers, while other cloud service providers mainly use air-cooled cooling solutions for their self-developed AI ASICs. TrendForce points out that cloud service providers will specify key component suppliers for the liquid cooling solution of GB200 rack cabinets. Currently, the main suppliers of cold plates are QIHONG and Cooler Master, manifolds are Cooler Master and Shuang Hong, and coolant distribution units are Vertiv and Delta Electronics. As for key components to prevent leakage, quick disconnects are mainly purchased from CPC, Parker Hannifin, Danfoss, and Staubli, with other suppliers such as JMJ and Fushida entering the verification stage. It is expected that in the first half of 2025, these suppliers will have the opportunity to join the ranks of quick disconnect suppliers, which will help gradually alleviate the current situation of supply shortage.

Contact: contact@gmteight.com