The Hi Lab team of Xiaohongshu proposed a reinforcement learning training method that could significantly reduce the average length of thinking.

date
19/06/2025
On June 19th, the technical team of Xiaohongshu (Little Red Book) published an article stating that the Deep Thought Model significantly improved its inference ability through Test-Time Scaling, but at the same time, it also encountered a large amount of redundancy and ineffective thinking. The Hi Lab team of Xiaohongshu proposed a reinforcement learning training method called "Think When You Need" which, without affecting the final result, achieved dynamic CoT capability, significantly reducing the average thought length. Experimental results have shown that this approach is widely applicable in various tasks such as inference and non-inference. The team also discovered a phenomenon: the smarter the model is in the same task, the shorter the required thought length is; this contradicts the performance of current deep thought models but is in line with human cognition.