This is the first of two articles on the costs of AI tokens and how companies can manage them. Keeping token costs under control and building a low-cost intelligence capability can create strategic advantage. Here we look at the need to measure the costs of AI against the outcomes that AI achieves. A companion article examines the challenges that this type of measurement presents.
As AI moves from pilot to production at scale, the cost of intelligence needs a new meter. The companies that pull ahead will not be those with the smallest or largest AI bill. Instead, they will be the ones that produce the highest return on AI, or RoAI.
The fast-rising cost of AI has made tokens—the basic units of data that AI models use and that AI providers charge by—a new center of focus in the C-suite and boardroom. The days of subscription-based software as a service are coming to a close. The top software and AI platforms have moved to metered charging, and the leading AI providers adopted a metered model from the start. They are now phasing out early subsidies, especially for the most advanced models.
CEOs need the ability to measure and optimize—workflow by workflow—the intelligence they buy, how much it costs, and what it creates. Given the power and cost of the technology, their job descriptions now include ensuring that the value of the outcomes exceeds the cost of the tokens used, which is often not the case. Software development is where the meter has first been applied. But the same economics are coming to customer service, sales, marketing, finance, and operations, as well as to the products and services that companies sell.
CEOs need to get their arms around the cost of AI and the returns that humans and AI together generate. The key is spending smart so that workflows improve organically over time as AI gets better, faster, and cheaper. They can start with a new approach to measuring and monitoring cost and RoAI in more depth.
Token Costs Are Hard to Track
The danger is not that tokens are expensive. Per-token cost varies depending on AI provider and model, with newer or more advanced models typically costing more. The bigger danger is that as more employees use AI, as more workflows become agentic, and as AI is built into more products and customer interactions, the total token bill scales up fast, and companies lose track of how much they are spending—and on what. (See Exhibit 1.)
Early big AI users (mostly tech firms) have actually encouraged “tokenmaxxing”—burning as many tokens as possible in an application—although this practice is now widely discouraged. AI providers are also working to reduce the cost of context, which rises fast when agentic systems resubmit the full (and growing) context of an assignment for each reasoning loop. When the context is rerun for each step, the cumulative number of tokens billed across a session rises roughly with the square of the session’s length. A session that feels twice as long can actually cost four times as much. Newer models use caching or batch processing to control context costs.
Still, we estimate that the cost spread between a “simple” AI model doing a basic task and a frontier model designed to handle more complex assignments can run roughly 5 to 25 times more per token. And the cost scales superlinearly. IDC projects that the top 1,000 global companies will underestimate their AI infrastructure costs by as much as 30% through 2027.
Another level of complexity needs to be factored in. Token costs cannot simply be delegated to IT as a generic cost line, which is usually what happens today. Depending on their use, they hit the P&L on three separate lines—capex, opex, and cost of goods sold. (See Exhibit 2.) CEOs should ask their CFOs and CIOs to start allocating costs accordingly.
Building the Capability. When tokens are used to design agents, redesign workflows across functions, and—for those that run their own AI inference—execute at scale, developing AI infrastructure is capital expense. AI behaves like capitalized software, or like factory automation for knowledge work. It is a one-time investment that lowers operating cost or expands capacity over time.
Tokens Running Internal Work. Engineers coding, analysts modeling, marketers drafting, agents resolving service contacts: these are operating expenses, governed like a budget rather than as capital allocation. You cannot write a token allocation onto a balance sheet as an asset, but you can set a sensible budget per function and per workflow and hold mangers accountable for the output.
Cost of Goods Sold. Tokens consumed inside a product that is shipped to customers represent cost of goods sold, and they can have a big impact on margins. Unlike traditional COGS for software as a service, which scales sublinearly, AI inference scales up with every customer interaction. It compresses gross margin directly. Our analysis indicates that AI-enabled software margins are resetting in the range of 65% to 80%, compared with margins for AI-native products of 50% to 65%. Burying the cost of inference in a generic hosting bucket, such as the IT budget, eliminates the ability to track margins on a product-by-product basis.
Tokens Need a Common Denominator
What’s missing in the current assessment of token costs is a common denominator that enables CEOs and their C-suite colleagues to assess returns. The denominator that matters is cost per outcome. Activity metrics are not enough: a coding agent that generates thousands of lines of code is not valuable in and of itself; it is valuable only if that code ships and reduces the total human effort required. Leaders need to know the cost per resolved ticket, accepted campaign asset, qualified lead, completed analysis, or successful customer interaction.
What unifies all of the above—capital, opex, COGS—is a single measure of whether the spending on tokens is working:
| Economic return | ||
| RoAI | = | ———————————————- |
| Cost of human intelligence + Cost of tokens |
The denominator deliberately carries human cost because, in nearly every productive workflow, people start the work, steer it, and sign off on it. It is the hard work in a metered, superlinear, subsidy-distorted market. But this is a CEO- and CFO-level question that needs to be answered if company leadership wants to manage a portfolio of intelligence investments, rather than an IT line item. Cutting headcount to shrink the denominator shrinks the numerator faster. The discipline is to optimize the ratio, not either factor in isolation.
This formulation also obviates two mistakes that companies are already making with respect to token costs. One is tokenmaxxing and the runaway bills that follow. At the other pole is minimization—clamping down on, or capping—token consumption. This mainly serves to starve high-return work and biases the organization toward the narrow labor substitution use cases that are easiest to measure and the smallest in payoff.
Intelligent Controls for Tokens
CEOs can pull five levers to exercise intelligent control over token spending and return:
- Stop or Minimize. The cheapest token is the one never spent, and the reflex to turn every workflow into an agent is itself a cost-and-reliability defect. Deterministic problems, such as structured lookups, rules-based routing, and calculations, are solved more cheaply, faster, and more reliably by deterministic code and APIs than by an agentic AI loop. Do not send deterministic problems to a model. They are better addressed by software.
- Route. Equally, sending every task to the strongest model is a major budget drain. Match the task to the right level of intelligence. Use frontier models only where frontier reasoning changes the outcome. Let the system auto-route where it can.
- Cache. Reusable context, such as system prompts, policies, knowledge bases, and standards, can be cached and read back at a steep discount on major platforms such as Anthropic, Google Gemini, and OpenAI, which cite big savings for cached input. Task your direct reports with assessing which knowledge, policies, and workflow patterns can be reused rather than paid for every time.
- Govern. Google's DORA (DevOps Research and Assessment) research has found that AI does not by itself improve delivery performance; rather it amplifies whatever organizational system already exists. And the gain is conditional on the work. Research has shown that AI can produce a material margin increase on simple greenfield tasks, but the improvement is far smaller on complex legacy code. Governance matters: assign every material workflow an owner, outcome, cost-per-outcome baseline, P&L line, and decision threshold. For each workflow, ask whether it should be scaled up, optimized, re-scoped, or stopped.
- Train. Better-trained employees produce higher RoAI because they understand what they are trying to achieve. Output is not outcome, gains are conditional on task complexity, and spending rises based on guesses or feelings unless it is tied to measured return. Move from tokenmaxxing to “right use” by increasing AI literacy across your organization.
The size of the AI budget is not important. Winners will have budgets that behave like portfolios, increasing where intelligence compounds P&L value and falling where it only compounds cost. The question for CEOs is not whether the enterprise can afford AI. It is whether the enterprise can demonstrate that the intelligence it buys is worth more than it costs.
The authors are grateful to these BCG colleagues for their ideas and input: Nicolas De Bellefonds, Abhinav Gupta, Steven Kok, Matthew Kropp, Vladimir Lukic, Clark O'Niell, Rohan Panjwani, and Vikram Srikumar.