Shanxi: The first national-level systematic planning document has been released. The data labeling industry will develop rapidly.
21/01/2025
GMT Eight
Shanxi released a research report stating that with the introduction of the first national systemic planning document, the data labeling industry is expected to experience rapid development and significantly improve the quality of data supply, thereby addressing the problem of high-quality data shortage that hinders the development of the domestic artificial intelligence industry. Focus on manufacturers with technological advantages in the field of data labeling and experience in scene implementation, as well as server manufacturers that provide computing resources for running data labeling tasks.
Event: On January 13th, the National Development and Reform Commission, the National Bureau of Statistics, and four other departments jointly issued the "Implementation Opinions on Promoting High-quality Development of the Data Labeling Industry", proposing that by 2027, the professionalization, intelligence, and technological innovation capabilities of the data labeling industry will be significantly improved, while the industry scale will grow significantly, with an average annual compound growth rate of over 20%.
Key points from Shanxi:
The first national systemic planning document suggests that the data labeling industry is entering a period of rapid growth.
The "Implementation Opinions" provide comprehensive guidance on labeling needs, technological innovation, and ecosystem development, including:
1) Labeling needs: Focus on exploring public data labeling needs in intelligent manufacturing, information services, and strengthen enterprise data labeling in key industries such as transportation, healthcare, and finance;
2) Technological innovation: Accelerate breakthroughs in key technologies such as cross-domain and cross-modal semantic alignment, 4D labeling, and strengthen the research and development of intelligent labeling tools that are self-controlled and controllable;
3) Ecosystem development: Promote the coordinated development of data collection, labeling, and AI application industry chains.
Currently, the domestic data labeling industry is characterized by a fragmented market structure, with market participants including technology giants such as Baidu (Crowdtangle) that have built their own data labeling platforms, professional data service providers such as Beijing Haitian Ruisheng Science Technology Ltd., and Yunce Data, as well as a large number of small and medium-sized data labeling crowdsourcing companies. The quality of data labeling and the quality of practitioners vary, and the "Implementation Opinions" aim to effectively eliminate non-standard enterprises through the establishment of unified industry standards, promote industry standardization, and look forward to supporting mergers and reorganizations to cultivate a group of leading enterprises.
Data labeling improves the quality of data supply and promotes the development of the domestic artificial intelligence industry.
Currently, the problem of the shortage of high-quality Chinese language corpus is becoming increasingly prominent. According to the May 24, 2021 White Paper on Large-scale Model Training Data released by Ali Research Institute, English accounts for 59.8% of the content on global websites, while Chinese only accounts for 1.3%. The electronic and networked degree of Chinese language corpus is significantly insufficient, and many high-quality Chinese language corpora, represented by Chinese values, cannot be publicly accessed due to copyright, privacy, and other restrictions.
Data labeling is a key technology to improve data quality. With the strong promotion of the development of the data labeling industry at the national level in policies and other aspects, the quality of data supply will be improved, thereby enhancing the competitiveness of China's artificial intelligence industry.
Investment advice: Focus on manufacturers with technological advantages and experience in scene implementation in the field of data labeling, including Beijing Haitian Ruisheng Science Technology Ltd. (688787.SH), Iflytek Co., Ltd. (002230.SZ), Yimpuk Technology, Cloudwalk Group (688327.SH), etc.; at the same time, pay attention to server manufacturers that provide computing resources for running data labeling tasks, including Inspur Electronic Information Industry (000977.SZ), Dawning Information Industry (603019.SH), among others.
Risk warning: Policy implementation falls short of expectations, data labeling technology development falls short of expectations, and industry competition intensifies risks.