48 Hours Witness AI’s New Battlefield: Alibaba Joins, Tencent Open Sources, Manycore Tech Lists — Has The World Model Reached Its ChatGPT Moment?

date
21:02 21/04/2026
avatar
GMT Eight
Alibaba Cloud launched HappyOyster, Tencent released open‑source HY‑World 2.0, and Manycore Tech(0068.HK)listed in Hong Kong within 48 hours, signaling China’s rapid push into world model commercialization. Manycore Tech’s IPO was oversubscribed 1,591 times, surged 144% on debut to HK$18.6, and reached a market cap above HK$30 billion.

China’s AI sector experienced three consequential developments within a 48‑hour span from April 16 to April 17, 2026. Alibaba Cloud introduced the world model HappyOyster, Tencent released the open‑source Hunyuan3D 2.0 (HY‑World 2.0), and Manycore Tech(0068.HK)made its debut on the Hong Kong Stock Exchange as a publicly listed spatial intelligence company. The near‑simultaneous timing of these moves signals a shift in the world model domain from exploratory research toward a commercialization inflection point.

When Google DeepMind unveiled Genie 3 in August 2025, the industry was still debating the definition of a “world model.” By April 2026, at least a dozen major global players had entered the world model or simulator arena, including Google, OpenAI, Meta, NVIDIA, Manycore Tech, and Runway. Manycore Tech’s IPO drew notable market enthusiasm—its Hong Kong public offering was oversubscribed 1,591 times, the international tranche 14.46 times, the stock rose 144% on debut to close at HK$18.6, and its market capitalization exceeded HK$30 billion—mirroring the strategic investments and product launches from Alibaba and Tencent. These developments raise a central question: could world models become the next phenomenon comparable to ChatGPT?

Alibaba Cloud’s ATH Innovation Division (Token Hub) positioned HappyOyster as a “world simulator” rather than a conventional video generator. The product’s principal capabilities include a roaming mode that supports one minute of continuous real‑time displacement and a director mode that supports sequences exceeding three minutes at 480p/720p resolution. HappyOyster follows a native multimodal and long‑sequence modeling approach and aligns with the generative video paradigm exemplified by Google’s Genie 3, while extending interactive duration. The launch occurred less than a month after Alibaba Cloud announced a target of achieving more than US$100 billion in cloud plus AI revenue within five years. Alibaba Cloud’s Q3 fiscal 2026 revenue rose 36% year‑over‑year, and AI‑related product revenue sustained triple‑digit growth for ten consecutive quarters. The HappyOyster release thus represents both a technical demonstration and a strategic pivot from a large‑model arms race toward world model infrastructure.

On the same day, Tencent’s Hunyuan team published Hunyuan3D 2.0 (HY‑World 2.0) as an open‑source 3D world model compatible with game engines such as Unity. Tencent’s approach contrasts with Alibaba’s closed‑source strategy; HY‑World 2.0 emphasizes exportability by producing editable 3D asset files—mesh, 3DGS, and point cloud formats—that can be used for secondary editing rather than merely for video playback. This design directly addresses engineering needs in game development and film previsualization and targets B‑to‑B scenarios that require editable assets.

Manycore Tech(0068.HK)listed on the Hong Kong Stock Exchange on April 17, 2026. The company’s prospectus reported RMB 820 million in revenue for 2025, a gross margin of 82.2%, and adjusted net profit of RMB 57.1 million, marking a transition from loss to profitability. Manycore Tech has articulated a commercial flywheel linking spatial editing tools, spatial data, and spatial large models, with core products including SpatialLM, SpatialGen, and SpatialVerse. Founder and CEO Huang Xiaohuang stated that the company spent 15 years assembling what it describes as the world’s largest physically accurate spatial dataset, a competitive moat that is difficult to replicate in the short term. The IPO attracted cornerstone investors such as Taikang Life, Sunshine Life, GF Fund, Redwood, and Mirae Asset, with total cornerstone commitments of HK$455 million.

Across the industry, three distinct technical schools have emerged. One school treats the world as a sequence of videos and relies on generative video techniques to simulate world dynamics; representatives include Google Genie 3, OpenAI Sora, Alibaba HappyOyster, and Runway GWM‑1. This approach typically employs Diffusion Transformer or AR‑Transformer architectures, prioritizes temporal coherence and visual fidelity, and aims for long‑duration generation. Its advantage lies in intuitive outputs and a clear path to content creation monetization, while its limitation is a tendency to capture surface phenomena rather than deep physical understanding, which constrains applications such as robotics training.

A second school emphasizes abstract predictive representations and causal structure rather than pixel‑level reconstruction. Meta’s V‑JEPA 2, led by Yann LeCun, exemplifies this direction with a joint embedding predictive architecture that performs prediction in latent space and stresses causal reasoning and interpretability. This route aligns more closely with human cognitive models and suits decision and planning tasks while demanding relatively less compute. Its drawback is the lack of immediately intuitive outputs for content production, which limits near‑term commercial use cases.

A third school centers on spatial intelligence and three‑dimensional understanding. Representatives include Stanford World Labs, Tencent HY‑World 2.0, NVIDIA Omniverse, and Manycore Tech. This perspective holds that a genuine world model must comprehend three‑dimensional space and object relationships and produce editable, exportable 3D assets rather than only video. Techniques such as 3DGS and NeRF are common, with emphasis on geometric stability, spatial consistency, and integration with existing game and robotics workflows. The spatial intelligence route is directly applicable to game development and robotic simulation and offers the fastest path to engineering deployment, though it faces challenges in real‑time interactivity and computational complexity.

When assessed across technical metrics, product maturity, and commercialization progress, the leading tier comprises organizations that have released usable products with clear technical specifications and real‑time interaction capabilities, including Google’s Genie 3, Alibaba’s HappyOyster, Tencent’s HY‑World 2.0, and Manycore Tech. A second tier includes entities with distinctive technical approaches but narrower application focus, such as Meta’s V‑JEPA 2, Tesla FSD, and NVIDIA Omniverse. A third tier contains teams still in early validation stages, including Baidu Wenxin, Byte Doubao, Runway GWM‑1, Stanford World Labs, and OpenAI Sora.

The U.S. retains advantages in foundational research, compute infrastructure, and large‑scale data resources—innovations such as JEPA and AR‑Transformer variants originated in U.S. laboratories, while NVIDIA GPUs and Google TPUs provide compute advantages and platforms like YouTube and Instagram supply extensive training data. Chinese firms demonstrate strengths in engineering execution, application integration, and capital market progress; Alibaba and Tencent produced comparable products within months of Genie 3’s debut, and Manycore Tech’s listing indicates progress in capital formation for spatial intelligence. Nonetheless, foundational architectural innovation remains concentrated in the United States, and whether follower‑plus‑engineering strategies will secure long‑term leadership is unresolved.

Commercialization faces several obstacles. Inference cost is a primary concern: real‑time interaction requires substantial compute, and both Genie 3 and HappyOyster support 24 frames per second generation. How providers will absorb GPU costs and which enterprise customers will pay for such capabilities are open questions. On the consumer side, the value proposition for ordinary users is not yet self‑evident. Manycore Tech’s vertical focus on scenarios such as home renovation design, monetized through SaaS and API offerings, demonstrates commercial viability in targeted domains, but whether such vertical models can scale to general‑purpose world models remains uncertain. Strategic choices between open source and closed source will also shape outcomes: Meta and Tencent have opted for open releases to cultivate ecosystems, while Google and Alibaba have chosen closed approaches to protect core technology and build commercial moats. Geopolitical factors, including export controls on high‑end GPUs and cross‑border data restrictions, further complicate global adoption.

Three plausible scenarios frame the near‑term future. In an optimistic outcome, a killer application emerges by 2028, enabling world models to become foundational AI infrastructure integrated with embodied intelligence and robotics. In a neutral outcome, multiple technical approaches coexist, each serving distinct domains, and world models become important components within the AI toolkit without supplanting large language models. In a pessimistic outcome, technical barriers and slow commercialization lead to consolidation and a redefinition of the concept toward pragmatic combinations of video generation, 3D modeling, and reinforcement learning.

Ultimately, the decisive contest will be ecological rather than purely technical. Google leverages DeepMind research and TPU compute to build a dual moat of technology and infrastructure. Alibaba pursues an integrated cloud‑plus‑model strategy anchored in enterprise relationships. Tencent’s open‑source approach aims to attract developers and secure footholds in gaming and film ecosystems. Manycore Tech relies on 15 years of spatial data accumulation and demonstrated profitability to position itself as a spatial intelligence infrastructure provider. Each strategy has merit, but long‑term success will favor the organization that constructs the most complete ecosystem—combining technical leadership, scenario deployment, developer engagement, capital support, and favorable policy alignment.

The 48 hours in April 2026 may be recorded as a pivotal moment in AI’s evolution. The coordinated moves by Alibaba, Tencent, and Manycore Tech mark a shift of world models from laboratory experiments toward industrial infrastructure. The timing of a true “ChatGPT moment” for world models will depend on the pace of technical breakthroughs and, critically, on the depth of ecosystem construction. Over the next two to three years, the industry may witness the emergence of benchmark cases and the exit of some participants. One certainty remains: control of spatial understanding will be central to control of physical AI’s future, and Chinese companies have positioned themselves at the forefront of this race.