First, the basics: what is a token
Before talking about money, there’s one word to understand: token. A token is the smallest unit an AI model works with. It’s not exactly a word, nor a letter. It’s a chunk of text roughly four characters long, about three quarters of a word in English. “Hello, world!” is around four tokens. A long conversation can be tens of thousands.
Every time you write something to Claude, ChatGPT, or any model, you’re sending tokens. Every time the model responds, it generates more tokens. And each token has a real computational cost, measured in operations a machine has to execute.
The machine behind the answer
This is where the hardware comes in. When you ask an AI something, the answer doesn’t appear by magic. There’s a GPU processing it.
A CPU, the traditional processor in any computer, is designed to do a few things very fast and sequentially. It’s good at complex tasks that require logic, one after another. A GPU, on the other hand, was originally designed for video games: it has thousands of small cores that perform many simple operations in parallel at the same time. That architecture turns out to be perfect for AI, which needs to multiply enormous matrices of numbers millions of times per second.
A high-end GPU like the NVIDIA H100, the current standard in AI data centers, can process thousands of tokens per second. That sounds like a lot. The problem is it doesn’t serve a single user. It serves dozens, sometimes hundreds of users simultaneously, distributing its capacity among all of them. And when models are larger, or conversations longer, or a user is running an agent that makes multiple chained calls, consumption spikes.
An AI agent doesn’t ask the model a single question. It asks several: one to plan, one to execute, one to review, another if there’s an error and it needs to retry. Each one consumes tokens. Each token consumes compute. And compute has a real cost, measured in electricity, hardware, and GPU time.
I could go into the full problem of AI data centers here — the infrastructure that makes all of this possible and what it’s costing to build — but that deserves an article of its own. I’ll write it another time.
The business that doesn’t add up
For years, Anthropic, OpenAI, and GitHub have operated under a model that would raise eyebrows in any other industry: charging less than the service costs and absorbing the difference with investor capital.
The numbers are striking. According to analysis published by Forbes, a user on Claude’s Max plan at $200 a month can consume up to $5,000 in real compute. The subscription covers 4% of the cost. The rest is paid by Anthropic with investor money.
GitHub Copilot operates on a similar logic. Under its current system, a user can consume up to eight times the value of their subscription in computational resources. Someone paying $10 a month can be using $80 worth of compute. GitHub absorbs the difference quietly. This is not sustainable. The move to token-based billing is not a possibility, it’s a matter of time. When a company subsidizes up to eight times the real cost per user, the model has an expiration date.
The logic behind all of this was simple: grow first, charge later. Get users first, then figure out how to make the business profitable. It’s the same logic Uber used, Netflix used, many digital platforms used in their early years.
The problem is that with AI, costs don’t go down with scale. If anything, they go up.
What’s coming
The shift to token-based billing isn’t just a change in business model. It’s the moment when the real cost of AI starts becoming visible to the end user.
Until now, the subsidy hid that cost. A developer using Claude Code eight hours a day couldn’t see what they were actually consuming. A company deploying agents to automate processes couldn’t either. When that cost shows up on the bill, decisions will change. Cheaper models will be chosen for simple tasks. Prompts will be optimized to consume fewer tokens. People will question whether a given automation is actually worth what it costs.
That’s healthy. But it’s also the end of an era in which using AI felt almost free because someone else was covering the difference.
The question that remains is whether, when the price is real, the value proposition will still hold up. For some use cases, almost certainly yes. For others, the answer will be less clear than it seemed when everything was subsidized.