Bin responded to investors' "doubts": the emergence of DeepSeek will increase global computing power demand rather than decrease it.
12/02/2025
GMT Eight
On February 12th, Dongfang Port publicly responded to investors' inquiries. However, Bin believes that the achievements of DeepSeek will increase the global demand for AI computational power, rather than reducing it. The biggest misconception in the market is fundamentally opposing algorithms, computational power, and data; in reality, there is a "synergistic relationship" between algorithms, data, and computational power. The application of AI in China and the US will bring various investment opportunities, and the commercial models of large models enterprises will continue to face challenges. Only by consistently maintaining leading models can they sustain a large user base and pricing advantage to offset the high upfront exploration costs. This difficulty is now becoming increasingly challenging.
As the year-end approaches, the Chinese quantitative fund team Deepseek has continuously released the V3 base large model and the R1 inference large model, shocking the world with performance comparable to OpenAI's strongest model at a lower order of inference cost.
Dongfang Port received many inquiries from investors, with the most attention given to three questions:
1) Despite the Chinese team being constrained by computational power, they were still able to develop globally leading AI large models, does this indicate that future progress in AI does not require computational power?
2) By modifying the PTX instruction set, the Deepseek team has optimized GPU usage, does this mean bypassing CUDA barriers will allow for the unfettered use of domestically-produced chips in the future?
3) What investment opportunities and risks will the cost reduction and equalization of Chinese models bring?
Regarding the first question, Dongfang Port's view is that Deepseek's achievements will increase the global demand for AI computational power, rather than reduce it.
Firstly, the biggest misconception in the market is fundamentally opposing algorithms, computational power, and data, mistakenly believing that the innovative progress of algorithms competes with and replaces computational power and data. In fact, there is a "synergistic relationship" between algorithms, data, and computational power.
In the development of artificial intelligence in the past 70 years, all three elements must progress simultaneously; if any element is constrained, artificial intelligence will stagnate. The first wave of artificial intelligence stalled due to algorithmic deficiencies, and the second wave stalled due to insufficient computational power. Now, thanks to advancements in algorithms, computational power, and big data, the third wave has made unprecedented leaps in the internet era.
Similarly, the development of any one of the three elements will increase the value of the others. Like in a family, the success of a father's career will bring more opportunities for his children or his wife's entrepreneurship. For example, if in the past inefficient algorithms allowed a single chip to serve only 10 users for a particular use case, with algorithmic efficiency improvements, that same chip can now serve 100 people. If the chip does not increase in price, its value will certainly increase by a factor of 10, not decrease. If the value of a product increases by a factor of 10, and the price remains the same, then demand will certainly increase, which is basic economic knowledge.
The reason the market incorrectly pits algorithms against computational power may stem from the current competition between China and the US. When Chinese model enterprises achieve breakthroughs in engineering algorithms despite limited computational resources, market psychology tends to project the "China-US competition" onto the algorithm and computational power competition. Coupled with the "mystery of Eastern power," Wall Street easily magnifies the emotions of "surprise."
Secondly, the "cost reduction and equalization" of mature AI models has been a major trend over the past two years. Deepseek's cost reduction and equalization offer as a "follower" in the 25th year marks its entry into the trend of "application popularization." While the cost reduction and equalization of mature models are different from the exploration of cutting-edge models, aiming to become the leader of AI model in the AI era requires significant computational power and resources, which is also the ambition of many giants apart from OpenAI.
The development of any technology generally follows a development model of "innovation-follow-reduce cost." The frontiers of "explorers" will invest heavily in experiments and exploration, eventually finding an effective technological solution and commercializing it; subsequently, society will see a large number of "followers" who replicate the products based on the explorer's ideas and further optimize them in terms of cost. This cost optimization approach will return to the explorer for integration and cost reduction, where both sides learn from each other and benefit each other. In familiar fields like innovative drugs and generic drugs, Tesla and Chinese electric cars, TSMC and other semiconductor foundries, as well as in the large model field, the same principles apply.
Currently, in most areas of large model capabilities (such as chat robots, real-time multi-modal models, logical reasoning models, etc.), OpenAI temporarily plays the role of explorer, followed by the four major models in North America (Gemini, Claude, Xai, Llama) close behind. Following North American companies are Chinese internet giants (such as ByteDance, Alibaba Qianwen, Baidu Wenxin, Tencent Hunyuan) and a number of model startups (such as Deepseek, Zhipu GLM, MiniMax conch, Yuezhi Anmian Kimi, etc.) keeping pace; outside of China and the US, there are few followers.
The following chart illustrates the speed of cost reduction for China and the US on the two tracks opened by OpenAI's "GPT4 and o1" over the past two years: since the launch of GPT4 in April 2023, a large number of followers have reduced the cost of models with the same performance by 1000 times - 3 orders of magnitude in 1.5 years; since the launch of the o1 series in September 2024, follower DeepseekR1 reduced costs by 27 times - 1 order of magnitude in 3 months, and follower Gemini 2.0 flash-thinking even reduced costs by 100 times - 2 orders of magnitude in the same period. Therefore, we say that "equality and cost reduction" are currently the most significant trend in the AI era, and Deepseek is no exception to this trend. People are still immersed in the shock of Deepseek, unable to let go for a long time, and even Google's more exaggerated cost reduction effects are not being talked about.
The reason why followers can catch up with explorers in such a short time is that the gap between the two is narrowing, not because they have caught up with the advanced algorithms of explorers or the same computing power. In fact, followers are transforming effective technologies into products by reducing costs and spreading benefits, rather than re-inventing the wheel. That's why followers are more likely to learn more from the explorers than compete with them.
Therefore, the fatigue of innovation belongs to the innovators, while the fatigue of reduction belongs to the followers. This is also the reason why Deepseek's reduction has been such an unexpected surprise - speed and cost reduction have been beyond ordinary observers' imagination, but in the era of AI, this is the norm.In the previous viewpoint, it was briefly mentioned that it is possible to achieve several orders of magnitude reduction in cost compared to explorers. There are many methods for achieving this, as explained in detail in the breakdown of the Deepseek technology report. Apart from engineering innovation, data distillation, and the decreasing cost of computing power over time, the biggest difference between explorers and followers comes from the "cost of exploration". Just like the difference in cost between developing new drugs and making generic drugs lies in the experimentation and clinical trials. Like other followers in the United States, Deepseek is unwilling to just follow and seeks to be at the forefront of the era. The cost they will have to pay to explore the frontier will be much greater than it is now.Furthermore, with the significant decrease in the cost of AI, the demand for reasoning brought about by the popularization of AI applications is the main arena for computing power. We compared the inference costs of model o1 in our annual review. At an output price of 55 dollars per million tokens, the use of inference models by Agent applications was almost impossible. However, in less than a month, the inference model costs have been reduced by 100 times through engineering optimization by competitors. The expected Agent application ecosystem is quickly approaching us at a rapid pace.
Deepseek has sparked a concept - the Jevons Paradox, which refers to the economic phenomenon where when resource efficiency improves, the total consumption of resources does not decrease, but instead increases. This theory was first applied to the consumption of coal in the 19th century. When Watt improved the steam engine and increased the efficiency of coal utilization (reducing coal consumption per unit of power by 75%), coal-fired steam engines were more widely used in factories, railways, and ships, accelerating the total consumption of coal and raising coal prices. The same situation also occurs when the fuel efficiency of cars improves (less fuel consumption per kilometer), leading to a significant increase in driving mileage and total fuel consumption, as well as energy savings from LED lights resulting in longer use of lights and more installations of lighting, leading to an increase in total electricity consumption. Before a technology is widely adopted, the decrease in the unit consumption of resources will promote an increase in the total consumption of resources. The same situation will also occur in the application of AI models because the AI era has just begun.
We can revisit the concept of "per capita computing power": if AI technology is destined to be popular in all walks of life, affecting the global population of 8 billion people, based on the current global AI computing power deployment of 4500 ExaFlops, the per capita computing power is 0.6 Tops, and the prospects are unprecedented. A chip for an autonomous driving car requires over 500 Tops, and the AI power of Tesla's latest FSD chip AI5 is expected to exceed 1500 Tops. There is still a great potential for growth in the total consumption of AI computing resources, provided that the efficiency of resource utilization is significantly improved.
In fact, since the release of Deepseek, we have seen a rapid increase in the prices of computing power in the leasing spot market (a small proportion of long-term spot prices), with many AI application companies starting to use Deepseek models as test solutions (the following figure shows Semi's GPU rental prices for Amazon), leading to a shortage of computing power. The official website of Deepseek has also experienced frequent crashes and refusals to respond due to the rapid increase in the number of users to 40 million (compared with only 60 million for Doucha). At the same time, Microsoft, Meta, Google, and Amazon, which released financial reports this month, have once again increased their capital expenditure on AI devices in 2025 to prepare for the upcoming inference application market.
As for the second question, the view of Dongfang Port is that CUDA has not been circumvented, but the barriers have been strengthened.
In the paper on Deepseek V3, it is described that to optimize the efficiency of NVIDIA chips, the team was not satisfied with CUDA's high-level language editing and directly edited the PTX instruction set at the lower level to modify the communication task allocation of the stream processors in the H800 chip, thus improving the communication efficiency and stability of the entire interconnection to some extent. Many people see this and think that Deepseek did not use CUDA software, but used PTX assembly language to modify the functions of the GPU, therefore the team has the ability to bypass CUDA and use assembly language to reproduce the model training on other manufacturers' chips. This is a very big misunderstanding.
First, let's explain what PTX is. NVIDIA chips cover a wide range of top-level application scenarios, including gaming graphics, autonomous driving, large language models, and scientific simulations among others. For each specific task in these areas that needs to be accelerated using the GPU, there is a need for supporting software libraries. For example, OptiX ray tracing acceleration in games, TensorRT-LLM for accelerating inference in large language models, etc. On the other hand, the underlying hardware design of NVIDIA chips has been continuously upgraded from the Pascal and Volta architectures in the past to the now well-known Ampere, Hopper, and Blackwell architectures, covering various aspects such as process technology, computing precision, and instruction set complexity. Therefore, both software and hardware are constantly iterating and evolving, which also leads to compatibility issues. Developers often worry whether the software designed today can still be adapted to future chip architectures. To address this issue, NVIDIA has designed a dedicated "intermediate representation" (PTX) to connect software and hardware. Regardless of how software and hardware are upgraded and changed, the code only needs to be translated through PTX to be able to adapt to different GPU architectures, and generate the corresponding machine code. This is similar to trade between China and Europe, where people in both places speak different languages. If there is a translator who is proficient in both Chinese and European languages as an intermediary, there is no need for every Chinese merchant to learn all European languages, and everyone can communicate directly in English.
The role of PTX in the computing field is similar to this "universal translation layer", translating the high-level language of CUDA software into an intermediate representation, and then transforming it into the SASS language that NVIDIA cards can understand (this part is confidential). In order to enhance NVIDIA developers' ability to adapt to GPU hardware, NVIDIA has opened up editing rights for PTX, allowing developers to not only write CUDA code but also directly adjust the PTX layer to optimize the efficiency of code execution on different GPU architectures. This process can be likened to a CEO (CUDA code) assigning tasks to the marketing director (PTX), the director refining tasks and eventually assigning them to various salespeople (SM stream processors). If the CEO believes that the director's allocation is unreasonable, they can directly intervene to adjust the task allocation and improve the efficiency of parallel tasks.
Therefore, the ability of Deepseek to use PTX (Parallel Computing Task Thread Execution) for task optimization is also permitted by the "editability" of the NVIDIA architecture. NVIDIA often absorbs developers.Edit the innovative engineering method of PTX, in turn optimizing the official CUDA operators, which is also the advantage of feeding back to the CUDA ecosystem. However, AMD, Huawei, and Cambricon chips, although they also have this intermediate expression layer (IR code), their IR code is not editable.After understanding the above principles, we can understand that DeepSeek uses PTX for hardware task execution optimization, not bypassing CUDA, but strengthening and benefiting the CUDA ecosystem.
Firstly, PTX is part of the CUDA architecture. CUDA refers not only to software but also includes PTX and the underlying hardware architecture, known as "Unified Architecture for Computing and Devices." It is this tightly integrated software-hardware architecture that allows CUDA to maintain high efficiency compatibility and optimization capabilities during the rapid iteration of GPU computing. PTX is essentially an intermediate representation (IR), which is just another way to express CUDA code.
Secondly, PTX can only be parsed and executed by NVIDIA GPUs. Editing PTX instructions allows users to develop and optimize in a more low-level way within the CUDA ecosystem, adapting and utilizing NVIDIA's GPU hardware architecture more efficiently, rather than bypassing or surpassing its architecture restrictions. The PTX instruction set is specifically designed for NVIDIA GPUs and is not suitable for GPUs or computing architectures of other manufacturers and cannot be directly ported to non-NVIDIA chips.
Furthermore, DeepSeek can edit PTX because NVIDIA has opened up permission for PTX instruction-level optimization, whereas the intermediate representation (IR) of other chips (such as Huawei Ascend, AMDGPU, Google TPU) is less open to developers, who usually cannot directly edit the underlying instruction set.
In conclusion, to completely bypass CUDA, there are two main paths: either redesign a whole set of GPU computing acceleration libraries and development frameworks covering multiple industries at the high-level programming language, requiring a lot of time, resources, and industry ecosystem support; or try to compile CUDA code into IR code other than PTX to adapt to different manufacturers' GPU hardware architectures, but this will be limited by compatibility and optimization. For example, AMD is using the HIP converter to migrate CUDA code to AMDGPU, but there are still performance losses and adaptation costs. This is similar to running Windows on a Mac although technically feasible, the performance, compatibility, and experience are usually worse than the native environment. Apart from that, there are hardly any better alternatives.
Regarding the third issue, the viewpoint of Dongfang Port is that there will be various investment opportunities emerging in Sino-US AI applications, and the business model of large model enterprises will continue to face challenges.
Deepseek, with its own efforts, did an "AI popular science" for the people of the country in just one month, and has equaled or even surpassed most American models in terms of model capabilities and inference costs. The most important contribution of Deepseek is the discovery of an efficient method, which is to use large models trained and capable of reasoning through reinforcement learning to conduct distillation, generating sample data containing "chains of thought," and directly supervising the fine-tuning of small models. Compared to directly applying reinforcement learning to small models, this approach can more effectively reproduce the reasoning ability of large models. Therefore, after the release of the R1 model, global enterprises and universities quickly initiated the re-enactment engineering of fine-tuning small models based on Chain of Thought data, allowing the model's inference ability to be quickly replicated and diffused outside the Deepseek system. The road to equal rights for inference models suddenly accelerated. Therefore, the AI application opportunities we see in the US will also be widely implemented in the Chinese market.
The only thing to note is that the difference in computing power between China and the US may continue to widen due to the upgrade of computing power control, such as the NVIDIA H20 chip being banned. While models like Deepseek have been adapted to domestic chips, domestic chips still have shortcomings in architecture, software acceleration libraries, and cluster capabilities, which may affect the quality of inference services for AI products. When more users simultaneously use a variety of AI applications, inference delays and server congestion may become the norm.
Shortly after the release of the R1, OpenAI also released the o3 model as scheduled and offered a free trial. The capabilities of o3 have made a qualitative leap compared to o1, and OpenAI temporarily retained its position as the "leader." However, in the game between "explorers and pursuers," if the explorers continue to innovate at a speed that cannot keep up with the pursuers in terms of cost reduction and reproduction, the explorers' early costs will not be recovered, and the business model will not be able to achieve a closed loop. If the pursuers cannot reproduce due to reasons such as "patent barriers" or "network effects," or if the explorers can continue to innovate and maintain a leading position, the explorers can maintain pricing power on cutting-edge products, while suppressing prices on the next generation of pursued products, ensuring the rationality of the business model, much like the business strategy adopted by TSMC in process technology. However, in the field of large models, where there are no network effects or patent protection, OpenAI or other model companies that hope to become leaders can only maintain a competitive advantage by continuously keeping ahead in model innovation to compensate for the high exploration costs incurred. This difficulty has become increasingly greater.
These are the main viewpoints of Dongfang Port on these three issues.
2025 is destined to be a year of high market volatility. However, after sorting out the details of investment, we need to return to the main theme of investment.
In the context of the era of AI, the wheels of time are clearly speeding up. At the same time, we must see that, alongside high volatility, the US stock market in 2025 is expected to see over $2 trillion in inflows, providing support for market valuation and stability. Share buybacks are expected to reach $1 trillion, enhancing investor confidence by reducing the number of outstanding shares and increasing earnings per share (EPS), especially as tech giants continue to increase buyback efforts.
The total dividends of S&P 500 companies are expected to reach $600 billion, attracting long-term investors, particularly retirement funds and 401(k) accounts, due to their stability and predictability.
In addition, retirement funds and long-term investment accounts are expected to contribute over $400 billion in inflows, which typically flow into passive-managed funds like S&P 500 ETFs, providing stable liquidity to the market.