China Securities Co., Ltd. Securities: OpenAI o1 significantly improves logical capability, with a sharp increase in reasoning side computing power consumption.
20/09/2024
GMT Eight
China Securities Co., Ltd. reports that OpenAI has released a new o1 reasoning model with deep thinking capabilities. The model spends more time thinking rather than directly responding to complex problems, demonstrating the ability to improve and adjust strategies, and performing well on complex issues in science, coding, and mathematics. The o1 model from OpenAI integrates tree thinking and reinforcement learning to achieve deep exploration of thinking patterns. The model satisfies the scaling law on the reasoning side, meaning the longer the model spends on reasoning, the stronger its ability to handle complex problems becomes. Through continuous tree retrieval and repeated self-play, o1 demonstrates human-like logical thinking potential. However, the computational power consumption on the reasoning side will significantly increase due to the repeated gameplay in the reasoning process.
OpenAI's o1 model has deep thinking capabilities and performs well on complex problems. With the popularity of ChatGPT across social networks, the large model industry has entered a rapidly developing stage. While the fundamental capabilities of models have significantly improved, the Transformer model still has certain limitations in overcoming its architectural constraints, especially when it comes to complex logical reasoning. Prompt engineering has emerged as a solution, significantly enhancing the reasoning capabilities of large models when designed with reasonable prompts. The new o1 reasoning model from OpenAI spends more time thinking rather than responding directly to complex problems, demonstrating excellent performance in improving and adjusting strategies in complex issues such as science, coding, and mathematics.
The integration of tree thinking and reinforcement learning in the o1 model from OpenAI allows for deep exploration of thinking patterns. By using intermediate reasoning steps, thinking chains and thinking trees help large models achieve complex reasoning capabilities, enabling large language models to evaluate their own intermediate thinking in rigorous reasoning processes. Reinforcement learning is a crucial research area in artificial intelligence, where agents learn continuously through repeated interactions with the environment to maximize rewards. The o1 model from OpenAI deeply integrates tree thinking and reinforcement learning, achieving self-training of thinking trees. By continuously optimizing the structure of its thinking trees through breadth-first and depth-first search, it enables deep exploration of thinking patterns.
Similar to the training side, scaling law exists on the reasoning side, with the computational power demand shifting gradually to reasoning. While past studies have confirmed the existence of scaling law on the training side, meaning models become smarter with more training, OpenAI's o1 model has also discovered scaling law on the reasoning side. This indicates that the longer the model spends on reasoning, the stronger its ability to handle complex problems becomes. Through continuous tree retrieval and repeated self-play in the reasoning process, o1 showcases human-like logical thinking potential. As a result of the increased computational power demand in the reasoning process, the overall computational power consumption of large models is gradually shifting from the training side to the reasoning side, playing a crucial long-term role in supporting the overall computational power demand.