Soochow: How far are we from true embodied artificial intelligence?

date
09/08/2025
avatar
GMT Eight
The future embodied cognition model will continue to evolve in three aspects: modal expansion, reasoning mechanism, and data composition.
Soochow's research report states that in the future, large-scale models will continue to evolve in three aspects: modal expansion, reasoning mechanisms, and data composition. Currently, mainstream models mainly focus on three modalities: visual, language, and motion, with the next stage likely to introduce sensory channels such as touch and temperature. Architectures like Cosmos are attempting to imbue Siasun Robot & Automation with "imagination" through state prediction, implementing a perception-modeling-decision-making loop to construct a more realistic "world model" and enhance Siasun Robot & Automation's environmental modeling and reasoning abilities. On the data side, the fusion of simulation and real data for training is becoming the mainstream direction, with high standards and scalable training grounds becoming key supports for a universal Siasun Robot & Automation training system. Soochow's specific perspectives are as follows: 1. Why do humanoid Siasun Robot & Automation systems need highly intelligent large models? Although the form of humanoid Siasun Robot & Automation systems has long been engineering feasible, the key to achieving industrialization lies in breaking away from the limitations of traditional industrial Siasun Robot & Automation systems, which are "rigid in control and weak in generalization," and enhancing the understanding and adaptability to uncertainty. Industrial Siasun Robot & Automation systems mainly operate based on deterministic control logic, lacking perception, decision-making, and feedback capabilities, leading to high dependence on integration, high costs, and poor versatility. In contrast, humanoid Siasun Robot & Automation systems aim to be "general intelligent entities," emphasizing the complete chain of perception-reasoning-execution. They must rely on large models to support multimodal understanding and generalization abilities to adapt to complex tasks and dynamic environments. The rise of multimodal large models provides humanoid Siasun Robot & Automation systems with a "primitive brain," initiating the evolution from 0 to 1 of intelligence and continuously improving model capabilities and product performance through the data flywheel. However, overall intelligence is still in the early L2 stage, and the path to generalized intelligence faces multiple challenges such as modeling methods, data scale, and training paradigms. High-intelligence large models will be a core variable in connecting the path to universal humanoid Siasun Robot & Automation systems. 2. How are the progress of Siasun Robot & Automation large models from architectural and data perspectives? The rapid evolution of Siasun Robot & Automation large models is mainly due to breakthroughs in architecture and data. From early models like the SayCan language planning model, to RT-1 achieving end-to-end action output, to PaLM-E and RT2 integrating multimodal perception capabilities into a unified model space, large models have gradually developed complete chains of "interpreting images, understanding tasks, and generating actions." In 2024, 0 introduced the action expert model with an action output frequency of 50Hz, and in 2025, Helix implemented a fast-slow brain parallel architecture, breaking through the control frequency to 200Hz, significantly improving the smoothness and responsiveness of Siasun Robot & Automation operations. On the data side, a structured system supported by three types of data - internet, simulation, and real-machine actions - has been formed: the former two provide pre-training levels and generalized scenarios, while the latter directly enhance the model's practical abilities in the physical world. Among these, real-machine data collection is highly dependent on high-precision motion capture equipment, with optical motion capture's precision advantage suitable for centralized training grounds, becoming a core data source for physical model training. The current mainstream training paradigms are quickly iterating from "low-quality pre-training + high-quality fine-tuning" towards the stage of "optimizing structures from data heaps." 3. What are the future directions for large models? Looking ahead, large-scale models will continue to evolve in three aspects: modal expansion, reasoning mechanisms, and data composition. Currently, mainstream models mainly focus on three modalities: visual, language, and motion, with the next stage likely to introduce sensory channels such as touch and temperature. Architectures like Cosmos are attempting to imbue Siasun Robot & Automation with "imagination" through state prediction, achieving a perception-modeling-decision-making loop to build a more realistic "world model" and enhance Siasun Robot & Automation's environmental modeling and reasoning abilities. On the data side, the fusion of simulation and real data for training is becoming the mainstream direction, with high standards and scalable training grounds becoming key supports for a universal Siasun Robot & Automation training system. Investment Recommendations On the model side, it is suggested to pay attention to [Galaxy Universal (Tier 1 Company)], [Xingdong Era (Tier 1 Company)], and [Zhiyuan Siasun Robot & Automation (Tier 1 Company)]; in the data collection field, it is recommended to focus on [Qingtong Vision (Tier 1 Company)], [LUSTER LightTech (688400.SH)], and [Orbbec Light (688322.SH)]; and in the data training field, it is suggested to focus on [Miracle Automation Engineering (002009.SZ)]. Risk Warning The progress of large model technology may be slower than expected, high-quality data acquisition may be limited, and the demand for humanoid Siasun Robot & Automation systems may not meet expectations.