Supports 33 languages! Tencent Hybird Introduces Ultimate Quantitative Compression Version Translation Model

date
16:51 29/04/2026
avatar
GMT Eight
Tencent Hy-MT1.5-1.8B-1.25bit, an ultimate quantized compression version translation model, is released.
On April 29, Tencent Huanyuan launched the ultimate quantized compression version translation model Hy-MT1.5-1.8B-1.25bit, compressing a large translation model supporting 33 languages to 440MB. It does not require internet connection and can be downloaded and run directly on the phone locally, with translation quality superior to Google Translate. Based on the Huanyuan translation model Hy-MT1.5, the translation effect is comparable to commercial translation models. Hy-MT1.5 is a professional translation model created by the Tencent Huanyuan team, natively supporting 33 languages, 5 dialects/minorities, and 1056 translation directions. From common Chinese-English translations to French, Japanese, Arabic, Russian, and even minority languages such as Tibetan and Mongolian, it can handle them with ease. With only 1.8B parameters, Hy-MT1.5 achieves comparable translation effects to commercial translation APIs and 235B level large models. In rigorous evaluation benchmarks, its translation quality not only surpasses mainstream systems like Google Translate but also demonstrates that lightweight models can unleash impressive translation capabilities through efficient optimization. The most ultimate quantized compression, fitting the model into a phone. Quantized compression, in simple terms, refers to storing model parameters in lower-bit instead of the original 16-bit format. This is similar to compressing a high-resolution photo into a thumbnail the file size is greatly reduced, but the content can still be clearly seen. Tencent has specially introduced two extreme quantization compression solutions, 2-bit and 1.25-bit, for different smartphone users. Effectiveness scores of models of different sizes in the FLORES-200 bilingual translation task. 2-bit model: Balancing performance and quality (suitable for mid-to-high-end devices) The 2-bit model utilizes top-tier stretched elastic quantization (SEQ) in the industry to quantize model parameters to {-1.5, -0.5, 0.5, 1.5} and combines quantization-aware distillation. While compressing the model volume to 574MB, it achieves nearly lossless translation quality, surpassing large models of over 100GB. On mobile devices supporting Arm SME2 technology, the 2-bit model can achieve faster and more efficient inference. 1.25-bit model: Sherry ultimate compression (suitable for all devices) To achieve extreme lightweighting, Tencent has introduced the 1.25-bit model based on Sherry (Sparse Efficient Tertiary Value Quantization) technology. This technology solution has been accepted by the top NLP academic conference ACL 2026. The core logic of the Sherry compression scheme lies in the "fine-grained sparsity" strategy: for every 4 model parameters, 3 of the most important ones are stored in 1-bit, and 1 in 0, averaging only 1.25-bit per parameter. Coupled with the STQ core designed specifically for mobile CPUs by Tencent, this solution achieves perfect adaptation to SIMD instruction sets. Ultimately, the original 3.3GB model is further compressed to 440MB, easily running in the background and allowing even memory-constrained ordinary phones to smoothly perform high-quality offline translations. This open-source release not only includes model weights but also specially produces a practical Tencent Huanyuan translation demo version, particularly adapted for "word retrieval in the background" mode. Whether checking emails or browsing the web locally, Huanyuan translation is readily available. No network connection, no subscription required, completely locally processed, without collecting or uploading personal information, download once to use permanently.