KNOWLEDGE ATLAS (02513) launched GLM-5.1 high-speed version API with a refresh rate of 400 tokens/s, breaking the global speed limit.
On May 22, Zhipu (02513) announced the opening of the GLM-5.1 high-speed version API (GLM-5.1-highspeed) to some corporate clients.
On May 22, KNOWLEDGE ATLAS (02513) announced the opening of the GLM-5.1 high-speed version API (GLM-5.1-highspeed) to some corporate clients. The model output speed reaches 400 tokens/s, surpassing the global API speed limit for large models, and achieving a dual breakthrough in flagship capabilities and extreme low latency for domestic large models for the first time.
The high-speed version released this time breaks the industry's inherent perception of "fast means small", without sacrificing model quality for response speed. Its advantages are particularly significant in speed-sensitive scenarios such as coding: Coding Agent tasks often require dozens of rounds of model calls, and the high-speed version can achieve "instant question-answering", completely changing the traditional pain point of long response times for multiple model calls, making the model truly become a real-time collaborative partner.
Test data shows that the high-speed version has outstanding capabilities: code generation efficiency is improved by about 10 times, it can synchronously understand engineering context and output solutions; 3D scene modeling can achieve real-time interaction between text input and scene; it can also instantly generate tools and interactions that match requirements, with the prototype of a new operating system.
This API was jointly developed by the KNOWLEDGE ATLAS GLM team and the TileRT team, optimized in three layers: rewriting the core reasoning path to improve single-card throughput; lowering tail latency through dynamic batch processing and KV cache scheduling; optimizing cluster and network cooperation to ensure stable availability of 400 tokens/s. Its core breakthrough lies in the TileRT engine, eliminating redundant overhead through compile-time static orchestration and Tile-level microtask scheduling, approaching the physical hardware limit.
Currently, the GLM-5.1 high-speed version has been adapted for high-sensitive scenarios such as AI programming, real-time interaction, business decision-making, and real-time speech on the KNOWLEDGE ATLAS MaaS platform. In the future, KNOWLEDGE ATLAS will continue to optimize the reasoning engine, expand high-speed service coverage, and provide enterprises with low-latency, high-intelligence production-grade AI capabilities, consolidating the leading position of domestic large models at the forefront of global technology.
Related Articles

HK Stock Market Move | Huaqin Co., Ltd. (03296) surged more than 10% in early trading as the company entered the Hong Kong Stock Connect list. It recently completed the acquisition of a minority stake in Nexchip Semiconductor Corporation.

HK Stock Market Move | MOG DIGITECH (01942) fell by more than 20% in early trading, suggesting a 2-for-1 rights issue at a discount of about 54.5% to raise HK$44.6 million.

Canada introduces new regulations on streaming media: Netflix and Spotify are forced to "increase taxes", while spending on local content is significantly increased.
HK Stock Market Move | Huaqin Co., Ltd. (03296) surged more than 10% in early trading as the company entered the Hong Kong Stock Connect list. It recently completed the acquisition of a minority stake in Nexchip Semiconductor Corporation.

HK Stock Market Move | MOG DIGITECH (01942) fell by more than 20% in early trading, suggesting a 2-for-1 rights issue at a discount of about 54.5% to raise HK$44.6 million.

Canada introduces new regulations on streaming media: Netflix and Spotify are forced to "increase taxes", while spending on local content is significantly increased.






