China Securities Co., Ltd.: AI multimodal and world models are reshaping the business logic of multiple industries.
Looking to the future, the development of native multimodal and world modeling technologies is expected to reshape the landscape of downstream industries such as marketing, film and television, and gaming.
China Securities Co., Ltd. released a research report stating that as a manufacturer relatively leading in global multimodal technology, top models such as Google and Kwai have focused on solving the challenges of consistency of roles and physical logic. Kwai Ling Yuehuo has broken through millions of active users and achieved growth in subscription revenue, marking the transition of multimodal tools from entertainment to productivity. On the application side, AI comics have taken over short films as a new growth trend. Platforms like ByteDance are driving content refinement through high incentives, accelerating the process of turning AI into IP-based films and television, potentially generating new market opportunities and reshaping the logic of advertising and game asset production. Looking ahead, the evolution of native multimodal and world model technologies is expected to reshape the landscape of downstream industries such as marketing, film, and gaming.
Key points from China Securities Co., Ltd. are as follows:
As a manufacturer relatively leading in global multimodal technology, Google's Veo, Gemini, and Nanobanana series models have established strong barriers in the field of long-context understanding and native audiovisual integration. Domestic companies like Kwai Ling, MiniMax Hailuo, Alibaba Tongyiwanshi, and KNOWLEDGE ATLAS have also addressed industrial production challenges such as role inconsistency, physical logic breakdown, and uncontrollable storyboarding through architectural and technological innovations, accelerating the commercial transformation of multimodal technology.
Hailuo AI: The Hailuo 2.3 series model updated by MiniMax (00100) on October 28 focuses on physical stability and full modal collaboration, solving the problem of physical breakdown under large dynamic camera movements. The model has achieved realistic simulation of lighting direction, light and shadow transition, and physical collision logic under large-scale motion instructions, particularly demonstrating high stability in complex body movements such as fine grasping and finger crossing. Hailuo Media Agent further packages video, voice, and language models into a unified intelligent entity, supporting natural language collaboration in an infinite canvas. Users only need to input simple business ideas, and the Agent can autonomously generate scripts, render videos, and configure audio.
Ling AI: Kwai Ling (01024) released the industry's first video large model integrating multiple creative tasks into a unified engine on December 1. Based on the multimodal visual language concept, the model integrates features such as video generation from reference images, content addition and deletion, and style redrawing, addressing the issue of fragmented functions in past creative endeavors. According to internal testing data from Kwai, Ling's o1 model achieved a win rate of 247% in image reference tasks and 230% in instruction transformation tasks, performing prominently in understanding complex creative intents. The Ling Video 2.6 model further enhances audio synchronization and motion control capabilities, supporting natural language dialogue and sound effects output...
The company named "KNOWLEDGE ATLAS" released a series of AI images with the cooperation of Huawei. This made the company the first to complete the whole model training on national chips and the image of the maosai. The model then combines the architecture mixture which understands logical regression using a variety of decods, emphasizing macrological understanding and micro-detail, especially when it comes to handling intensive-scenario tests. With a complexity sense in the scene, especially with complex visual arts, it resulted in the resolution of the technical challenge of misalignment of rendered Chinese characters. GLM-Image then goes on to produce coherent outputs ranging from 1024x1024 to 2048x2048 with a distributive dialogue. The cost of running the API is just only 0.1 yuan per image which overall shows a massive loyalty.
On the industry side, the breakthrough in model capabilities has driven community spread and commercialization. Kwai Ling 2.6's "action control" function has driven videos featuring phenomena such as dancing pets to go viral globally, not only attracting more consumer users but also directly converting into subscription revenue. According to exclusive research data from Late, Ling AI's monthly active users surpassed 12 million in January 2026. As of January 20, 2026, the number of paying users on the Ling app increased by 350% compared to the previous month, with daily revenues in January even higher by about 30% (Ling's single-month revenue exceeded 20 million in December 2025). In terms of revenue breakdown, Ling is expected to generate a revenue of 140 million in 2025, with professional producers contributing nearly 70% of the share. This aligns with Kwai's strategy of focusing on P-end users (self-media video creators, advertising professionals, etc.), indicating that multimodal AI tools have transcended from entertainment properties to becoming productivity essentials for professionals in the film and advertising industries, forming a preliminary commercial loop.
AI comics have become a new scene in video generation applications following short films, with top platforms like ByteDance driving content refinement through aggressive incentive policies. For example, the Douyin Short Film Copyright Center launched the "Comics Creation Incentive Program" on December 16, 2025, offering 15% technical cost subsidies to institutions using the Dou Bao Large Model to produce comics. Furthermore, Douyin has increased incentives for comic creation, providing 5,000 yuan/min as a base incentive for S+ comics, with individual series having a base incentive of 500,000-750,000 yuan, and the highest revenue sharing for top works reaching 30,000 yuan/min. The platform has also opened up a library of over 60,000 high-quality IP novels, fully subsidizing adaptation costs. Data shows that in September 2025, Red Fruit Short Films had approximately 236 million monthly active users, surpassing B station and Youku, and nearing Mango TV. In terms of market size, short films are expected to break through a hundred billion and comics through 20 billion this year, showing enormous potential. From the generation of video materials for short films to the creation of complete comics, AI technology is driving a new industrial ecosystem, gradually generating commercial value.
Looking ahead, multimodal technology is expected to evolve towards native multimodal processing of video, audio, images, and text, as well as towards the development of world models that possess physical knowledge and logical reasoning. The former emphasizes that AI can process various modalities under a unified framework, while the latter implies that AI can predict what will happen in the next frame based on the current frame, similar to the human brain.
In terms of application scenarios, native multimodal and world models will reshape the business logic of multiple industries. For example, in the search and marketing field, the shift from SEO to GEO is currently underway, potentially evolving towards generative visual search, where users can not only search for images but also receive AI-generated customized videos as responses. In the entertainment sector, short films and comics are showing rapid growth trends, and the combination of novel IP and AI video production can accelerate the process of adapting IPs into visual media. The gaming industry is also heavily influenced by generative AI, with top companies applying AI to assist in the production of artistic assets. With the support of world models, real-time game engines will become possible, enabling experiences similar to a "top player" open-world gaming in the metaverse.
Risk alerts:
AI industry commercialization may not meet expectations; market competition risks; geopolitical risks.
Related Articles

HK Stock Market Move | Golden stocks continue recent bullish trend, safe-haven sentiment driving precious metals higher, spot gold breaks through $5000 for the first time.

HK Stock Market Move | CMOC Group Limited (03993) rose by over 4% as the acquisition of the Brazilian gold mining project was completed.

Annual profit reversal and earnings release, MICROPORT (00853) stock price surged and then retreated, is it also a "money-making signal"?
HK Stock Market Move | Golden stocks continue recent bullish trend, safe-haven sentiment driving precious metals higher, spot gold breaks through $5000 for the first time.

HK Stock Market Move | CMOC Group Limited (03993) rose by over 4% as the acquisition of the Brazilian gold mining project was completed.

Annual profit reversal and earnings release, MICROPORT (00853) stock price surged and then retreated, is it also a "money-making signal"?

RECOMMEND

Paul Chan Says Hong Kong Has Licensed 11 Virtual Asset Exchanges, Stablecoin Licenses Expected Later This Year
22/01/2026

Ministry Of Finance And Other Departments Introduce Comprehensive Fiscal And Financial Policies To Boost Domestic Demand
22/01/2026

Capital Migration: Five Years On, An In‑Depth Analysis Of China’s 11 High‑Growth Venture Capital Tracks In 2025
22/01/2026


