Guotai Junan: Real-time voice large model for bean bags launched, benefiting from AI software and hardware landing scenarios.
21/01/2025
GMT Eight
Guotai Junan released a research report stating that Dou Bao's real-time voice large model has been launched, and this model has achieved understanding emotions, networking, controlling tone and intonation, and the ability to interrupt at any time. The addition of emotional voice interaction will significantly benefit the application of AI software and hardware, reduce barriers to use, improve efficiency, enhance experience, and will be a significant help for the landing of AI software and hardware scenarios.
Event: On January 20th, Dou Bao's real-time voice large model was officially launched and fully opened on the Dou Bao APP.
Guotai Junan's main points are as follows:
The model can understand the emotion of the input language, be connected in real-time, have powerful voice control capabilities at the output, be highly similar to humans, have ultra-low latency, and can be interrupted at any time.
Its human-level voice conversation capabilities are specifically demonstrated in the following aspects: 1) personalized emotional responses, able to understand users' content and emotions, and respond appropriately with the right tone; 2) powerful voice control and rich emotional interpretation capabilities, able to follow complex commands, output different tones, emotions, and states, tell stories, speak dialects and accents, and even sing; 3) balance between intelligence and expressiveness, the model's expressiveness is highly close to that of a real person, including human-like tone words, pauses for thought, etc. The model also has real-time networking capabilities, allowing it to dynamically obtain the latest information and provide precise, timely responses to time-sensitive issues; 4) smooth interactive experience and ultra-low latency, achieving accuracy and naturality in lower system latency situations, with sharp voice interruption and user dialogue stoppage capabilities.
The overall satisfaction of Dou Bao's real-time voice large model is significantly better than GPT-4o, especially in terms of naturalness of voice tone and fullness of emotions.
The Dou Bao team selected dozens of external testers based on multiple dimensions such as personification, usefulness, emotional intelligence, call stability, and fluency of conversation, and the overall satisfaction score of Dou Bao's real-time voice large model (rated out of 5) was 4.36, compared to 3.18 for GPT-4o. More than half of the testers gave the highest rating to the Dou Bao model, and the model exhibited a clear advantage in emotion understanding and expression, particularly in the evaluation of whether it sounded like AI or not, with a very low probability of being recognized as AI.
The addition of emotional voice interaction will significantly benefit the application of AI software and hardware, reducing barriers to use, improving efficiency, enhancing experience, including the following application scenarios:
1) Emotional companionship and smart education applications, recommending Kingnet Network (002517.SZ), benefiting companies such as Hubei Century Network Technology Inc. (300494.SZ), Southern Publishing and Media (601900.SH), Astro-century Education & Technology (300654.SZ), Kunlun Tech (300418.SZ);
2) AI companion hardware such as AI toys, AI pets, can help hardware manufacturers and IP owners expand demand, recommending Shanghai Film (601595.SH), benefiting companies such as Zhejiang Jinke Tom Culture Industry (300459.SZ);
3) AI glasses, AI headphones, AI speakers, and other life efficiency products will benefit from improved interaction methods, benefiting companies such as XIAOMI-W (01810).
Risk Warning: AI application landing progress is lower than expected, commercialization progress is lower than expected, related companies' main business performance pressure.