NOTES

China’s Quiet Advantage: Why AI Tokens Are the New Oil

The US is playing the wrong game... And it's time to upgrade our decrepit electrical infrastructure.

By Paul DiMaggioApril 21, 20267 min read

If you want to understand where the AI race is really being won, stop staring at model leaderboards and start looking at something much more basic: the tokens.

Tokens are the fuel of modern AI systems, and China is rapidly becoming the world’s low-cost, high-volume producer of that fuel. In a world shifting from simple chatbots to task‑doing AI agents, that advantage matters more every month.

What AI Tokens Actually Are (And Why They Matter)

Under the hood, large language models (LLMs) don’t process text the way humans do; they process tokens - small chunks of data that might be a word, part of a word, or even a punctuation mark, each mapped to a token ID.

Every prompt you send and every answer you receive is translated into long sequences of these tokens. When a model generates a response, it is essentially predicting the next token over and over again until it completes the output.

Historically, this was mostly about question‑and‑answer chatbots: ask a question, get an answer, move on. Now, agentic AI is changing that dynamic. Instead of single exchanges, agents execute full workflows on your behalf - think “handle our entire month‑end financial close and reporting workflow” instead of “find me flights.”

That difference is not just qualitative; it is quantitative:

A simple chatbot interaction might consume thousands of tokens.
A fully automated agent running a complex task can burn through orders of magnitude more - searching, reasoning, calling tools, and iterating as it goes.

The result: AI tokens are becoming a scarce, high‑value commodity in their own right, much like oil in the industrial era.

The Numbers: China’s Token Surge

The scale of China’s token output is already startling. In a single week in February, Chinese AI models collectively delivered about 4.12 trillion tokens, while US models produced roughly 2.94 trillion.

That’s not a trivial edge; it’s a sign that more and more of the world’s AI “fuel” is being generated in China. And it’s not just about volume - it’s about price.

Two leading Chinese models, MiniMax and Moonshot, charge on the order of 2–3 US dollars per million output tokens. By contrast, a prominent US model like Anthropic’s Claude Sonnet 4.5 comes in around 15 US dollars per million output tokens, roughly a sixfold difference.

Put simply:

Same unit of AI “fuel”
One provider charging 2-3 dollars
Another charging about 15 dollars

For cost‑sensitive builders, that gap is impossible to ignore.

The Gold Rush for Cheap Tokens

That pricing has triggered something of a gold rush: startups in Silicon Valley and beyond are actively seeking out cheap Chinese tokens to power their products.

The logic is straightforward:

Agentic AI is far more token‑hungry than earlier chatbots.
Token usage is directly tied to cost of goods sold for AI-native products.
If you can cut token cost by 5-10x, your runway and pricing power both improve dramatically.

You can already see this in practice. Airbnb’s CEO Brian Chesky has openly said the company uses Chinese large language models, including those from DeepSeek and Alibaba’s Qwen, in part because they’re cheaper and easier for engineers to fine‑tune.

It’s not just big tech; early‑stage teams are making the same calculation on a much tighter budget. If your agent needs to run thousands of background calls a day, your choice of model is effectively a choice of wholesale token supplier - and the Chinese suppliers are undercutting the market.

Why Chinese Tokens Are So Cheap

The cost advantage isn’t coming from nowhere. There are two main structural drivers behind China’s cheaper tokens.

Lower electricity costs Running large-scale AI infrastructure is, at bottom, an energy problem. China’s domestic electricity is meaningfully cheaper than in the US, and that directly reduces the operating cost per generated token.
More compute‑efficient model architectures Many Chinese models lean on a mixture‑of‑experts (MoE) architecture, where different “experts” inside the model handle different parts of the workload. Rather than activating the entire model for every request, the system selectively routes tokens through a subset of experts, delivering similar performance using substantially less compute.

Interestingly, US chip export controls may have nudged China in this direction. With restricted access to the most advanced GPUs, Chinese labs were forced to squeeze more performance out of what they had, optimizing architectures for efficiency. That pressure contributed to models that are structurally cheaper to run on a per‑token basis.

Combine cheaper energy with more efficient models and you get a durable cost advantage that’s very hard to match just by tweaking pricing.

From Chatbots to Agents: Why Tokens Are the Bottleneck

The move from chatbots to agents is more than a UX upgrade; it fundamentally changes the economics of AI.

A chatbot interaction is bounded: the user asks a question, the model responds, and the session ends. An agentic workflow, by design, keeps going. It will:

Break tasks into sub‑tasks
Call tools and APIs
Query models repeatedly
Iterate based on intermediate results

Every one of those steps consumes tokens - often a large number of them.

That has two critical implications:

Demand for tokens scales faster than user growth. As users shift from simple queries to delegated tasks, average tokens per user can explode, even if your active user count grows slowly.
Token pricing becomes the key input to business models. If your agent is 10x more token‑intensive than a chatbot, then token cost is no longer a rounding error; it is the margin.

Jensen Huang at Nvidia has leaned into this, noting that agentic AI is inherently more token‑consuming and therefore more demanding on GPUs. The more tokens the world burns, the more compute everyone needs, and the more valuable efficient infrastructure becomes.

In that environment, the supplier who can deliver tokens cheapest - right now, increasingly, Chinese providers - wields outsized leverage over where and how AI gets built.

Strategic Concerns: Cheap Tokens, Expensive Dependencies

The rush toward Chinese tokens isn’t happening in a vacuum. Policymakers and commentators in the US are already voicing concerns along three axes:

Data and security: When US companies pipe data through Chinese LLMs or agentic systems, regulators have limited visibility into how that data is handled or where logs might reside.
Regulatory reach: The core IP - algorithms, weights, engineering teams - sits in China, out of reach of US oversight. That makes meaningful regulation difficult even if risks are identified.
Strategic dependency: As Chinese models become embedded in critical US industries - from consumer apps to enterprise workflows - China’s role as a provider of AI “fuel” becomes a strategic dependency in its own right.

At the same time, this isn’t a one‑way risk. Investors in the US and Europe are frothing-at-the-mouth over Chinese AI companies - MiniMax’s IPO generated strong foreign interest, and DeepSeek is reportedly pursuing new funding at a 10 billion dollar valuation.

Everyone sees the same thing: a rapidly scaling, low‑cost producer of the core commodity of the AI era.

What This Means for Builders and Businesses

If you’re building AI‑native products, China’s token advantage changes the calculus in several concrete ways.

Unit economics: Your gross margin and pricing flexibility are now heavily tied to your token supplier. A 5-10x cost difference per million tokens can be the difference between a viable product and one that never clears its cloud bill.
Architecture choices: Architecting for efficiency - caching, compression, smarter prompting, and MoE‑like routing - will matter even if you’re not running models yourself. The cheapest token is still the one you don’t have to generate.
Vendor risk: Leaning into Chinese LLMs can be a huge competitive advantage on cost and capability - but it also introduces regulatory and geopolitical exposure you can’t fully control.

The broader takeaway from the episode is simple: tokens have moved from being an implementation detail to a strategic resource. China spotted that early, optimized hard around it, and is now exporting that advantage at scale.

The question for anyone building in AI is no longer just “which model is best?” It’s “whose tokens am I depending on - and what does that choice imply for my costs, my users, and my long‑term leverage?”

← All posts

KEEP READING