Tencent Mixed OCR Model Announces Open Source, with only 1B parameters and multiple core capabilities achieving state-of-the-art effects.

date
14:45 25/11/2025
avatar
GMT Eight
On November 25th, Tencent Hybird released a new open-source model called HunyuanOCR, with only 1B parameters. It was built on the native multimodal architecture of Hybird and achieved multiple industry-leading performance in OCR applications.
On November 25th, Tencent Hunyuan released the full Boai NKY Medical model HunyuanOCR, with a parameter of only 1B, built on the Hunyuan native multimodal architecture, achieving multiple industry OCR application SOTA (state-of-the-art) scores. According to the introduction, HunyuanOCR has high usability, is small in size, easy to deploy, thanks to the end-to-end design philosophy of the Hunyuan native multimodal large model, all functions only need a single forward inference to achieve optimal results, more efficient and convenient than the industry's cascading solutions, with high cost performance. The Hunyuan OCR expert model is built on the Hunyuan native multimodal architecture, mainly composed of three parts: native resolution video encoder, adaptive visual adapter, and lightweight Hunyuan language model. Unlike other open-source OCR expert models or systems, the training and inference of the HunyuanOCR model both adopt the full end-to-end paradigm, through the use of large-scale, high-quality application-guided data, combined with online reinforcement learning, the model demonstrates a very robust end-to-end inference capability. It is worth noting that the Hunyuan OCR has achieved SOTA results in multiple core capabilities, including a top score of 94.1 in the OmniDocBench evaluation of complex document parsing, surpassing Google's Gemini3-pro and other leading models; text detection and recognition capabilities, on a benchmark covering 9 major application scenarios (documents, artistic fonts, street views, handwritten, advertisements, receipts, screenshots, games, videos), significantly outperforming similar open-source and commercial OCR models; on the OCRBench list, the total score is 860 points, with only a 1B total parameter model configuration, achieving SOTA results including general visual understanding models with total parameters below 3B.