110 million tokens per second! Microsoft Corporation (MSFT.US) and NVIDIA Corporation (NVDA.US) join forces to break the AI inference record.

date
19:10 04/11/2025
avatar
GMT Eight
Microsoft announced that its Azure ND GB300v6 virtual machine achieved a new industry record of inference speed of 1.1 million tokens per second on Meta's Llama270B model.
Microsoft Corporation (MSFT.US) announced that its Azure ND GB300v6 virtual machine has achieved a new industry record of 1.1 million tokens per second for inference speed on Meta's Llama270B model. It is reported that the Azure ND GB300 virtual machine uses NVIDIA Corporation's (NVDA.US) Blackwell Ultra GPU, specifically the NVIDIA GB300NVL72 system, which consists of 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs, in a single server architecture. This virtual machine is optimized for inference workloads, with a 50% increase in GPU memory and a 16% increase in thermal design power (TDP). Microsoft Corporation CEO Satya Nadella stated on social media, "This achievement is the result of our long-term collaboration with NVIDIA Corporation and expertise in running AI at scale." According to information, to validate the performance improvement, Microsoft Corporation ran the Llama270B (FP4 precision) model on 18 ND GB300v6 virtual machines under one NVIDIA GB300NVL72 domain, using NVIDIA TensorRT-LLM as the inference engine. Microsoft Corporation stated, "A single rack of NVL72 Azure ND GB300v6 achieved a total inference speed of 1.1 million tokens per second." This new record surpasses Microsoft Corporation's previous achievement of 865,000 tokens per second on the NVIDIA GB200NVL72 rack. In response, Vice President of Signal65's Laboratory, Lars Felos, remarked, "This milestone not only surpasses the barrier of one million tokens per second but also achieves it on a platform that can meet the dynamic usage and data governance needs of modern enterprises." He added that the Azure ND GB300 has improved inference performance by 27% compared to the previous generation NVIDIA GB200, while only increasing the power specification by 17%.