Translates to: "110 million tokens per second! Microsoft (MSFT.US) and NVIDIA (NVDA.US) join forces to refresh the AI inference record"
According to the Securities Times APP, Microsoft (MSFT.US) announced that its Azure ND GB300v6 virtual machine has achieved a new industry record of 1.1 million tokens per second inference speed on Meta's Llama270B model. It is reported that the Azure ND GB300 virtual machine uses NVIDIA's Blackwell Ultra GPU, specifically the NVIDIA GB300NVL72 system, with 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs configured in a single machine architecture. This virtual machine is optimized for inference workloads, with a 50% increase in GPU memory and a 16% increase in Thermal Design Power (TDP).
Latest
2 m ago

