OpenAI integrated team plans to release a new language model in the first quarter to pave the way for the launch of AI personal screenless devices.

07:13 02/01/2026

OpenAI believes that the current ChatGPT voice model lags behind in accuracy and speed compared to text models. The new model will support natural emotional expression, real-time conversation, and interruption handling functions. OpenAI's hardware goal is to create a consumer-level device that can be operated through natural voice commands, and also plans to launch screenless devices such as smart glasses and speakers.

OpenAI is optimizing its audio artificial intelligence model in preparation for planned speech-driven personal devices. According to The Information on January 1st, OpenAI has integrated engineering, product, and research efforts over the past two months to focus on overcoming the technological bottlenecks in audio interaction, with the goal of creating a consumer device that can be operated through natural speech commands. Internally, researchers at the company believe that the current ChatGPT speech model lags behind in accuracy and response speed compared to text models, and the underlying architectures of the two are not the same. Reportedly, the new speech model will have more natural emotional expression capabilities and real-time conversation functionality, including the ability to handle interruptions in dialogue, a key feature that current models cannot achieve, with plans to release it in the first quarter of 2026. According to sources cited in the report, OpenAI also plans to launch a series of screenless devices, including smart glasses and smart speakers, positioning the devices as "collaborative partners" for users rather than just app gateways. However, before launching consumer AI hardware products that support voice commands, OpenAI needs to first change user habits. Team integration focuses on screenless interaction According to reports, OpenAI's current speech models and text models belong to different architectures, resulting in lower quality and speed of responses when users interact with ChatGPT through speech compared to text models. To address this issue, OpenAI completed key team integration over the past two months. At the organizational level, Kundan Kumar, a speech researcher who joined from Character.AI this summer, serves as the core responsible person for the audio AI project. Product research director Ben Newhouse is restructuring the audio AI infrastructure, with multi-modal ChatGPT product manager Jackie Shannon also involved. According to sources cited in the report, the new audio model architecture can generate more accurate and in-depth responses, support real-time conversations with users, and better handle complex scenarios such as interruptions in conversations. In terms of hardware form, OpenAI's judgment is similar to that of Alphabet Inc. Class C (GOOGL.US), Amazon.com, Inc. (AMZN.US), Meta (META.US), and Apple Inc. (AAPL.US): existing mainstream devices are not suitable for future AI interaction. The OpenAI team hopes that users will interact with devices through "speaking" rather than "looking at screens," believing that voice is the most natural way to communicate human instincts. In addition, former Apple Inc. design chief Jony Ive, who is collaborating with OpenAI on hardware development, emphasizes that screenless design is not only more natural but also helps prevent user addiction. He stated in a May interview: Even if the intent is harmless, if a product has negative consequences, responsibility must be taken. This sense of responsibility drives my current work. User habit cultivation becomes a key challenge The main obstacle OpenAI faces is user behavior. According to reports, most ChatGPT users have not developed the habit of interacting through voice, for reasons such as inadequate quality of audio models or users being unaware of the feature. To launch AI devices with audio at their core, the company must first cultivate the habit of users interacting with AI products through voice. Previous reports have shown that OpenAI invested nearly $6.5 billion to acquire Ive's io in early 2025 and has been simultaneously advancing multiple workstreams such as the supply chain, industrial design, and model development. The first device is expected to take at least another year to debut. This timeline means that OpenAI needs to improve the existing ChatGPT speech function to build a user base before the product launch and validate the practicality of audio interaction in daily scenarios. This article is republished from Wall Street View; Author: Bao Yilong; GMTEight editor: Chen Xiaoyi.

The People's Bank of China has increased its gold holdings for the 15th consecutive month.

100 billion is simply not enough to distribute! Investors are rushing to add to Anthropic, and the frenzy of oversubscription is pushing funding to 20 billion US dollars.

The Federal Reserve's Daly warns of vulnerability in the labor market, says it may be necessary to cut interest rates one to two more times this year.

The People's Bank of China has increased its gold holdings for the 15th consecutive month.

100 billion is simply not enough to distribute! Investors are rushing to add to Anthropic, and the frenzy of oversubscription is pushing funding to 20 billion US dollars.

The Federal Reserve's Daly warns of vulnerability in the labor market, says it may be necessary to cut interest rates one to two more times this year.