Lates News
Alibaba has open-sourced the new generation of the Qwen3 universal question answering model (referred to as Qwen3), with only 1/3 of the parameter volume of DeepSeek-R1, announcing a significant cost reduction and performance surpassing R1, OpenAI-o1, and other leading models. Qwen3 is a "hybrid reasoning model" that integrates "fast thinking" and "slow thinking" into the same model, greatly saving computational power. It is reported that Qwen3 adopts a Mixed Expert (MoE) architecture with a total parameter volume of 235B and only 22B activations required. Qwen3 has a pre-training data volume of 36T tokens and seamlessly integrates non-thinking modes into thinking models through multiple rounds of reinforcement learning in the post-training phase. Qwen3 has significantly enhanced capabilities in reasoning, instruction following, tool invocation, multilingual abilities, etc. while achieving a significant increase in performance, the deployment cost of Qwen3 has also been greatly reduced, requiring only 4 H20 cards to deploy the full version of Qwen3, with memory consumption only one-third of models with similar performance. (Sina Technology)
Latest
3 m ago