A thousand questions officially open sourced FlashQLA, which reduces the computational cost of the attention layer in the training and inference processes.
On April 29th, Qianwen Big Model announced the official open sourcing of FlashQLA, a high-performance linear attention operator library implemented based on TileLang.
On April 29th, Qwen's large model announced the official open-source release of FlashQLA, a high-performance linear attention operator library based on TileLang. FlashQLA merges and optimizes the forward and backward calculations of GDN Chunked Prefill operators in a reasonable way, achieving 2-3 forward acceleration and 2 backward acceleration compared to FLA Triton Kernel on NVIDIA Hopper in multiple scenarios. This significantly improves the efficiency of pre-training scenarios and edge-side agentic inference.
The Qwen team stated that since the release of Qwen3-Next, the Gated Delta Network (GDN) has become the main attention layer in the entire Qwen series, extending from Qwen3-Next-80B-A3B to the subsequent Qwen3.5/Qwen3.6 series. As the model scale expands to 397A17B, 122A10B, 35B, 27B, the overhead of GDN in end-to-end training and inference has become significant.
The core highlights of this release include: Gate-driven automatic intra-card sequence parallelism. By utilizing the exponential decay property of GDN gate, FlashQLA automatically enables intra-card sequence parallelism in TP, long sequences, and small head scenarios, increasing the GPU SM utilization; Hardware-friendly algebraic rewriting. Some modifications are made to the forward and backward processes of GDN Chunked Prefill, effectively reducing the overhead of Tensor Cores, CUDA Cores, and SFUs without affecting numerical precision.
Related Articles

ELIFE Holdings (00223) completed the placement of a total of 271 million shares, raising approximately 27.1 million Hong Kong dollars.

GigaDevice Semiconductor Inc. (03986) released its first quarter results with a net profit attributable to shareholders of approximately 1.461 billion yuan, a year-on-year increase of 522.79%.

ANDRE JUICE (02218) announces its first quarter performance with a net profit attributable to the parent company of 72.70 million RMB, a decrease of 15.52% compared to the same period last year.
ELIFE Holdings (00223) completed the placement of a total of 271 million shares, raising approximately 27.1 million Hong Kong dollars.

GigaDevice Semiconductor Inc. (03986) released its first quarter results with a net profit attributable to shareholders of approximately 1.461 billion yuan, a year-on-year increase of 522.79%.

ANDRE JUICE (02218) announces its first quarter performance with a net profit attributable to the parent company of 72.70 million RMB, a decrease of 15.52% compared to the same period last year.

RECOMMEND





