CITIC SEC: OpenAI released the strawberry model o1, significantly improving general inference capability.

date
17/09/2024
avatar
GMT Eight
CITIC SEC released a research report stating that in the early hours of September 13, Beijing time, OpenAI launched a new AI reasoning series model, o1. This series of models introduces large-scale reinforcement learning algorithms, representing model thinking as a thinking chain, significantly improving the model's general reasoning ability and alignment effects. According to OpenAI's official evaluation, o1 not only outperforms GPT-4o in most reasoning tasks, but can even be on par with human experts in some reasoning-intensive benchmark tests. With the improvement in reasoning performance, the computational power required for o1 series models has significantly increased, with o1-preview priced at approximately 12 times that of GPT-4o. The subsequent cost reduction is worth paying attention to. From an investment perspective, although the pure text output form of the o1 series models limits their application scenarios, CITIC SEC believes that AI blockbuster applications are expected to be unlocked first from high-value scenarios such as scientific research and programming, with software and internet companies likely to benefit first. In addition to investment opportunities on the application side, demand for hardware is also expected to continuously increase with the advancement of multimodal technology. CITIC SEC continues to be optimistic about the AI computational power aspect, especially with more opportunities for AI reasoning side computation brought by the gradual maturity of the business side. CITIC SEC's main points are as follows: Events: In the early hours of September 13, Beijing time, the global AI industry leader OpenAI officially launched a series of new AI reasoning large models, o1, aimed at solving complex tasks. According to OpenAI's official tweet, the o1 series models will include o1, o1-preview, and o1-mini. O1-preview has been made available, with users of ChatGPT Plus, Team, and developers with API usage level 5 (API payment over $1000) gaining access to the model. Enterprise and educational users are expected to gain access to the model from September 16 onwards. Consistent with previous reports from media such as The Information and The Medium, the o1 model performs as expected in terms of functionality, reasoning, and performance. Model mechanism and evaluation results: Reinforcement learning enhances coding/mathematical/reasoning capabilities significantly. According to OpenAI's technical blog, the o1 model introduces large-scale reinforcement learning algorithms during training, enhancing its ability to perform complex reasoning tasks. Reinforcement learning, which originated in the 1960s, aims to enable intelligent agents to achieve long-term global optimal benefits in complex environments through reward mechanisms. The core technology of Google's Go Siasun Robot & Automation AlphaGo is reinforcement learning. After applying this technology, the output process of the o1 series models is significantly different from that of the GPT series models: the o1 series models will first generate a longer thinking chain for up to 20-30 seconds before outputting, analyzing complex tasks by breaking them down into subtasks, and outputting the final result after summarizing the results of the subtasks, rather than immediately generating output like GPT series models. According to OpenAI's official website, the o1 model performs significantly better than GPT-4o in most reasoning tasks such as science tests, mathematics, and programming, and even performs on par with human experts in some reasoning-intensive benchmark tests. For example, in an international olympiad qualifying exam, o1 correctly answered 83% of the questions, while GPT-4o only answered 13%; in a PhD-level science quiz, both o1 and o1-preview outperformed human experts and GPT-4o. Market positioning: Enhanced security and reasoning capabilities are expected to unlock applications, while model costs need to be optimized. According to OpenAI's official technical blog, the thinking chain can effectively improve the model's security and alignment level: 1) the thinking chain can clearly show the model's thinking; 2) integrating behavior policies into the thinking chain of the reasoning model can efficiently and robustly teach human values. We believe that the main contradiction in the current AI industry is the lack of reasoning ability and the scarcity of blockbuster applications due to high costs. With the significant improvement in reasoning ability under the premise of guaranteed security, o1 is expected to gradually unlock applications. However, due to the high cost caused by the huge computational power requirements and the output form of pure text feedback, we expect that in the short term, o1's application scenarios will still be concentrated in specific high-value productivity scenes such as programming and scientific research. OpenAI has also introduced a more programming-savvy and cheaper reasoning model, o1-mini. According to OpenAI's official website, o1-preview is priced at $15 per million tokens for input and $60 per million tokens for output; o1-mini is priced at $3 per million tokens for input and $12 per million tokens for output; GPT-4o is priced at $1.25 per million tokens for input and $5 per million tokens for output. The current pricing for GitHub Copilot Team and Enterprise versions are $4 and $21 per month, respectively, and we believe that cost optimization for the o1 model is worth paying attention to in the future. Trend outlook: The reasoning stage is expected to follow the Scaling Law, and the value of multiple models working together is worth considering. During the development of the o1 model, OpenAI found that giving the model more reasoning time effectively improves the model's performance, following the Scaling Law of the reasoning stage. This discovery suggests that the computational power required for the reasoning side is expected to see vigorous growth. At the same time, the current access restrictions for the o1 series models also indirectly prove the strong demand for model computational power. According to OpenAI's official website, the current o1-preview15-1000token60-1000token proofread byColette Allen.Usage restrictions are 30 per week for W and 50 per week for O1-mini. In addition, we believe that the collaboration of multiple models at the application level is also worth attention. According to "Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models" (Jinliang Lu, Ziliang Pang, Min Xiao, etc.), multi-model collaboration has advantages such as overall performance improvement, enhanced multi-task processing capabilities, improved computational efficiency, reduced errors and illusions, knowledge sharing, and capability transfer. Multi-model collaboration can occur between large models or between large models and small models. Multi-model collaboration strategies can be categorized as fusion, integration, and cooperation, with cooperation being the most compatible and flexible method under different models, which can create a more comprehensive and efficient AI system, thus possessing considerable potential.Risk factors: Risks include underdevelopment of core AI technology, continued tightening of policy regulation in the technology sector, policy regulation related to private data, global macroeconomic recovery falling short of expectations, macroeconomic fluctuations leading to lower than expected IT spending by European and American enterprises, potential ethical, moral, and user privacy risks of AI, data leakage and information security risks for enterprises, and the continuous intensification of industry competition. Investment strategy: The update of the O1 series model in this round still focuses on the underlying algorithm level, especially the reasoning ability of large language models. From a technical perspective, the basic abilities of the O1 series large models have been significantly improved with the support of large-scale reinforcement learning methods, demonstrating not only the continuous iterative space in the road of scale and training computing input, but also indicating that the scaling law may continue to be followed in the inference stage, leading to a substantial increase in the computing power demand for the inference end. In terms of applications, although the O1 series model is limited to only text output, restricting its application scenarios, we believe that with the progress of the general reasoning ability brought by the improvement in underlying algorithm capabilities, AI blockbuster applications are expected to be unlocked first in high-value scenarios such as scientific research and programming, with software and the internet expected to benefit first. In addition to investment opportunities on the application end, demand on the hardware side will inevitably continue to increase with the advancement of multi-modal technologies. We remain bullish on AI computing power, especially with the opportunity for more AI inference side computing power brought by the gradual maturity of the commercial end.

Contact: contact@gmteight.com