Guolian Minsheng Securities: Observation of Large Model Manufacturers Raising Prices and Improving Margins of Demand in the "Inflationary" Short Term.

date
20:59 22/02/2026
avatar
GMT Eight
"Token Inflation" does not refer to the increase in the price of the token itself, but rather the structural increase in token consumption per unit time and per user.
Guolian Minsheng Securities released a research report stating that cloud computing is gradually becoming "selling resources," while large model manufacturers are transforming into "selling Token fuel + selling outcomes." The price increase of KNOWLEDGE ATLAS (02513) GLM Coding Plan reflects a change in the industry pricing logic: when reasoning consumption becomes production materials, model manufacturers have the opportunity to transform "scarcity of computing power" into gross profit and cash flow through tiered pricing and subscription products. Guolian Minsheng Securities' main points are as follows: Event: On February 12, KNOWLEDGE ATLAS announced an increase in the subscription price of GLM Coding Plan through official channels, with an increase of "at least 30%." Previously, overseas cloud providers also raised prices this month, with Google Cloud raising prices by up to 100% in North America and simultaneously in Europe and Asia; AWS also raised prices by about 15%. Overall, the increase in Token demand not only benefits cloud computing power, but also gives model manufacturers bargaining power. Subverting the traditional free path of the Internet The typical path of traditional Internet software is to use free services to exchange for user scale, and then monetize through advertising, membership subscriptions, value-added services, and transaction commissions, with the underlying reason being extremely low marginal costs. That is, the cost can be diluted by bandwidth and storage scale effects with the addition of one more user or one more click, resulting in a marginal cost close to zero. In the era of cloud computing, similar strategies of "first free/low cost then expansion" have appeared, but the billing units for cloud services quickly became CPU/storage/bandwidth/request numbers, and customers became accustomed to "pay as you go." Cloud services can charge because they deliver clear resources and SLAs (service level agreements signed between service providers and customers). However, when the industry is still in a "model price war," the price increase of KNOWLEDGE ATLAS indicates a shift from measuring units of traffic (DAU/duration) to Token (reasoning consumption), and Token consumption is becoming essential in an increasing number of scenarios. Change in the era of large models: Token becomes "measurable production material," no longer "free traffic" Large models have transformed services such as "dialogue/coding/content generation" that seem to be provided by software vendors into online reasoning services that heavily rely on computing power. For model manufacturers, each answer consumes GPU, memory, bandwidth, and electricity; for users, each time they ask the model to "think a little longer, write a longer piece of code, or run a more complex task" corresponds to more Token consumption, so Token naturally becomes a new measuring unit. KNOWLEDGE ATLAS previously experienced temporary shortages of computing power due to user growth and implemented a "limited release" plan for Coding Plan, forming a very typical "supply-demand chain": demand grows significantly in the short term resources show rigid constraints (resulting in restrictions/limits) price increase. When congestion and resource shortages occur during peak periods, a price increase serves as a mechanism for model manufacturers to screen demand, which is more user-friendly than "indiscriminate restrictions" and can also protect the user experience. Additionally, the cost side for model manufacturers is still strongly related to GPU supply, utilization, and reasoning optimization, so price increases/more rational tiered pricing can pull model manufacturers out of the trap of "the larger the scale, the more losses," which is conducive to improving the quality of gross profit and cash flow. Token demand is "inflationary" "Token inflation" does not refer to Token itself becoming more expensive, but rather the structural increase in Token consumption per unit time and per user. The surge in Token demand has several reasons: From "question and answer" to "working": As models have developed, users are no longer satisfied with simple answers and instead ask models to restructure code, rewrite files, generate documents, and run tests. The characteristics of programming scenarios are naturally "long context, multi-round iteration, and large output," which entails significant Token consumption. As confirmed by KNOWLEDGE ATLAS, developers rely on its models for coding support, leading to rapid growth in Token consumption. From "single round" to "multi-round Agent": KNOWLEDGE ATLAS positions GLM-5 as a new generation model for Coding and Agent scenarios; On February 12, MiniMax-WP (00100) also officially launched its latest flagship programming model M2.5, labeled as the world's first production-level model designed natively for Agent scenarios. M2.5's programming and agent performance is directly benchmarked against Claude Opus4.6. Agents will plan, retrieve, execute, reflect, and call the model multiple times, naturally accumulating Token consumption step by step. Increasing intensity of reasoning: More "deep thinking, longer chain reasoning" will significantly increase Token consumption in both output and intermediate processes. For developers, this often leads to higher success rates and fewer rework, making users more willing to "burn more Tokens for efficiency." This means that Token is not the nearly zero marginal cost "traffic" of the traditional Internet era, but an essential "fuel" for production tasks. Investment recommendation Cloud computing is gradually becoming "selling resources," while large model manufacturers are transforming into "selling Token fuel + selling outcomes." The price increase of KNOWLEDGE ATLAS GLM Coding Plan reflects a change in industry pricing logic: when reasoning consumption becomes production materials, model manufacturers have the opportunity to transform "scarcity of computing power" into gross profit and cash flow through tiered pricing and subscription products. It is recommended to: - Watch cloud providers and computing infrastructure: AI-driven IT spending and infrastructure investment are still in an upward cycle, and the cloud side will benefit from the continuous growth of "complementary consumption" such as GPU computing power, storage, and network I/O. - Monitor large model manufacturers: When they can maintain subscription retention and corporate seat expansion in high-ROI scenarios such as coding, Agent, and enterprise processes, and can stably convert "Token usage" into the delivery value of "saving manpower, time, and rework," they have the ability to navigate through open source and price wars. - Security governance and protective tools at runtime: As companies embed AI into workflows, risks such as data leakage and agent authorization violations will drive "AI security platforms/governance platforms" to become essential layers. - Short-term observations of price increases and marginal improvements from demand (Token "inflation"), medium-term tracking of renewal and expansion from corporate seat retention, and long-term optimism about the market for "AI firewalls" brought about by the popularization of governance tools. Risk warning Technological changes have uncertainties; industry competition is intensifying.