Mark Cuban's AI Token Tax: The Numbers Work, But the Unit Is Wrong

Mark Cuban's proposal is straightforward: a federal tax on AI providers of less than fifty cents per million tokens, levied at the provider level. The Year 1 revenue math works. At current volume-roughly 250 trillion tokens per day globally, with U.S. providers handling perhaps 40 percent-that half-cent-per-million generates about $18 billion annually. Cuban projects this grows thirty to a hundred times over a decade.

The stated policy goal is externality pricing. Cuban argues the tax would push providers to optimize tokenization, caching, and routing; reduce energy consumption; and generate a funding source for AI's external costs. This frames the tax as a market-based mechanism to internalize costs the industry currently doesn't pay.

This connects directly to Cuban's earlier warning. Last month, he posted a mock IPO "risk factors" section on Instagram, forecasting that within four years companies could redesign operations to replace human workers with AI systems and humanoid robots at scale. His point wasn't about the technology-it was about what follows. If fewer people work, governments lose income tax revenue. Authorities would likely introduce new taxes, such as robot or AI utilization taxes, to replace it. The token tax proposal is the follow-through on that warning.

The Flow Math: Does the Arithmetic Hold?

The Year 1 numbers check out. At 250 trillion tokens per day globally with U.S. providers handling 40 percent, the half-cent-per-million tax generates roughly $18 billion a year-more than Cuban's $10 billion target. He has slack on the share assumption and still clears his stated bar. That part works.

But the 30x to 100x decade growth lacks any supporting volume assumptions. The tax is a volume-based levy that scales linearly only if the tax base remains fixed per token. Yet tokenizers are already optimizing-prompt caching cuts input tokens to near zero, speculative decoding reduces billable forward passes, and larger vocabularies compress output. These are real efficiency gains that shrink the tax base without reducing underlying compute or energy. The revenue projection assumes token volume grows faster than optimization-driven reduction, but no evidence supports that gap.

Then there's the price collapse compounding against a fixed per-token rate. The industry has seen roughly 200x annual decline in per-token prices for two years running. A tax that's 5 percent of frontier pricing in Year 1 becomes 100 percent in Year 3 if Congress doesn't index downward. Either the tax collapses as revenue, or it becomes confiscatory at the low end and providers route around it. There's no stable equilibrium where this raises 30 to 100 times more revenue without becoming a different policy entirely. The base is non-territorial-foreign providers like ByteDance's Doubao alone consume about 120 trillion tokens per day, and the tax does not apply to them. A U.S. per-token tax is structurally an import substitution subsidy for non-U.S. inference.

The Structural Flaw: Token Is Not a Flow Unit

A token is internal model accounting-not a unit of work, energy, compute, or value. The same English sentence produces different token counts in GPT-class, Claude-class, and Gemini-class models. Mandarin runs roughly two to three times more tokens per equivalent content than English; source code runs 1.5 to 2 times more than prose; some low-resource languages run ten to fifteen times more from the evidence. A flat per-token tax is therefore a tax on tokenizer efficiency, not on energy, compute, or externalities.

Mark Cuban's AI Token Tax: The Numbers Work, But the Unit Is Wrong

If two providers serve the same query with the same energy footprint but one provider's tokenizer is 30 percent more efficient, the second provider pays 30 percent more tax for delivering identical value from the evidence. The tax discriminates between providers on a basis unrelated to the harm being taxed. This is bad tax design under any framework, but it is especially bad when the providers writing the tokenizers are the same parties paying the tax. The tax base is endogenous to the taxed entity-optimization reduces the base the moment you tax it.

Prompt caching cuts input tokens to near zero; speculative decoding reduces billable forward passes; larger vocabularies and byte-level encoding compress output from the evidence. These are real efficiency gains that shrink revenue without touching the externality. Cuban projects 30x to 100x growth over a decade, but that requires token volume to grow faster than optimization-driven reduction, substitution to non-taxed providers, and the secular ~200x annual decline in per-token prices from the evidence. A tax that is 5 percent of frontier pricing in Year 1 becomes 100 percent in Year 3 if Congress doesn't index downward. Either the tax collapses as revenue, or it becomes confiscatory at the low end and providers route around it. There is no stable equilibrium where this raises 30 to 100 times more revenue without becoming a different policy entirely.

ByteDance's Doubao alone consumes about 120 trillion tokens per day; Alibaba targets 15 to 20 trillion per day; Gulf providers and European sovereign-AI projects are coming online from the evidence. A U.S. per-token tax on U.S. providers is structurally an import substitution subsidy for non-U.S. inference. American enterprise customers will route through foreign-domiciled API providers to avoid the tax. This creates arbitrage for foreign competitors while constraining domestic providers-a policy that raises revenue today but erodes the base tomorrow.

What to Watch: Policy Viability and Market Impact

Political viability hinges on a false premise. The proposal gains traction only if policymakers accept token count as a proxy for compute or energy-but no legislation reflects this framing, and for good reason: a token is unit of internal accounting, not a unit of work, energy, or compute consumed. Without a direct link to actual externalities, the proposal lacks the technical credibility needed to move through Congress.

Big Tech will dismantle the revenue base within 12-18 months. Expect rapid engineering workarounds-prompt caching, speculative decoding, tokenizer redesign, and routing shifts to non-taxed providers-that shrink the tax base without reducing underlying activity. The revenue projection requires token volume to grow faster than optimization-driven token reduction, substitution to foreign providers like ByteDance's Doubao, and the secular ~200x annual decline in per-token prices. That's a fragile equilibrium.

The alternative is obvious-and Cuban avoids it. A compute-based tax (GPU-hours) or energy-based tax would align with actual externalities, but his proposal explicitly rejects these harder units in favor of the easily-gamed token count. This isn't an oversight; it's the core design flaw. The tax becomes either confiscatory at the low end or collapses as revenue within three years unless Congress indexes downward-which it won't. The proposal either dies in committee or becomes a different policy entirely.

Mark Cuban's AI Token Tax: The Numbers Work, But the Unit Is Wrong

Mark Cuban proposes a half-cent per million token AI tax to fund externalities, but critics argue the flawed metric invites arbitrage and erodes the revenue base.

The Flow Math: Does the Arithmetic Hold?

The Structural Flaw: Token Is Not a Flow Unit

What to Watch: Policy Viability and Market Impact