AI & Business Strategy

Why Every Manager Needs to Understand Tokens

(And no, not the blockchain kind)

Not long ago, I was building a conversation flow for a customer support agent — the kind of AI assistant that sits in front of customers, understands their problem, and walks them toward a resolution. Designing it felt deceptively simple: map the questions, write the responses, handle the edge cases. But the moment we moved from a working prototype toward something that could handle real volume, a quieter set of questions surfaced. How much would each conversation cost? Would it stay fast under load? And could the agent actually “remember” enough of the conversation to be useful without forgetting where it started?

The hidden variable behind all three? Tokens.

Not crypto tokens. Not arcade tokens. The little units of text that every AI language model quietly counts, charges for, and is limited by. If your company is using AI in any form — a chatbot, a document summarizer, a coding assistant — tokens are the meter running in the background. And right now, most managers can’t read the meter.

Here’s the good news: you don’t need a computer science degree to understand them. You need about six minutes. Let’s go.

What Is a Token, Really?

When you type a sentence into an AI tool, it doesn’t read words the way you and I do. It breaks your text into small chunks called tokens.

A token is roughly ¾ of a word. Longer words get split into pieces.

A token is roughly three-quarters of a word, or about four characters in English. Some short words (“the”, “cat”) are a single token. Longer or unusual words get split into several. As a rough rule of thumb:

1 token ≈ 4 characters 100 tokens ≈ 75 words 1 page ≈ 500 tokens

So a one-page memo is about 500 tokens. A 20-page report is around 10,000 tokens. A novel is a few hundred thousand.

Why should you care about this strange unit of measurement? Because everything else about using AI — the cost, the speed, and the limits — is measured in tokens, not words, pages, or hours.

Think of tokens like the kilowatt-hours on your electricity bill. You don’t experience electricity in kilowatt-hours; you experience it as a lit room or a charged phone. But the meter only speaks in kilowatt-hours — and so does the bill. Tokens are the kilowatt-hours of AI.

The Two Kinds of Tokens (And Why One Costs More)

Here’s where it gets interesting for anyone watching a budget. AI providers count two separate types of tokens:

You pay for both directions — but the answer is the expensive half.

And here’s the part that surprises most people: output tokens cost significantly more than input tokens — typically anywhere from two to eight times more, depending on the model.

Why the gap? Without going deep into the engineering: the model can read your input in parallel, all at once, but it has to generate its answer one token at a time, sequentially. That sequential work is more expensive to run.

The practical takeaway for a manager is simple but powerful: a long-winded AI that rambles for 2,000 words costs you several times more than one that answers crisply in 400. Controlling the length of AI responses is one of the single most effective ways to control AI spend. “Be concise” isn’t just good style — it’s a cost lever.

What This Actually Costs

Let’s put real numbers on it, because abstractions don’t survive contact with a finance team. AI models are priced per million tokens, split by input and output. As of mid-2026, the range across the major providers is wide:

Bar width is illustrative — the spread between tiers is enormous.

That last point matters. Reaching for the most powerful model by default is like flying every employee business class for a commute across town. Sometimes justified. Usually not.

Let me translate this into the scenario I actually faced. The support agent I was designing didn’t just answer one question and stop — it held a conversation. And conversations have a sneaky property: with every back-and-forth turn, the model re-reads the entire chat history to stay coherent.

By turn ten, you’re paying to re-read the entire chat on every single message.

So turn one might send 1,000 tokens of context. But by turn ten, the agent is re-sending the whole growing transcript on every single message — the original instructions, the customer’s problem, and every exchange since. The input tokens balloon as the conversation goes on. Multiply that across thousands of daily conversations, and a workflow that looked cheap in the demo becomes a real line item in production.

On a mid-tier model, a few thousand support conversations a day might cost a manageable amount. Push the same volume onto a frontier model and the bill can climb 10× — turning a trivial cost into a five-figure annual decision. Same task. Same output quality, often. Wildly different bill. The only difference is whether someone designing the flow knew how to read the meter.

The Other Reason Tokens Matter: The Limit

Cost is one half of the story. The other half is the context window — and this one trips up even technical teams.

Every AI model can only “hold in mind” a certain number of tokens at once. This is its context window. Send more than that, and the model simply can’t see the overflow — it either refuses, truncates your document, or quietly forgets the earliest parts of a long conversation.

Anything past the window is invisible to the model — even if you sent it.

Modern models have grown enormous context windows — some now handle a million tokens or more, which is roughly a couple of thousand pages. But here’s the catch nobody mentions in the demo:

1Bigger context costs more. Filling a huge window means sending huge input — and you pay for every token of it, on every single request.
2More context isn’t always better answers. Stuffing an entire 500-page contract into the prompt to ask one question is often slower, pricier, and less accurate than extracting the relevant 3 pages first.

This is exactly the trap waiting in a long support conversation. The “just keep the whole transcript in context” instinct treats an unlimited-feeling tool as if it were free and infinite. It’s neither. At some point a sprawling chat either gets expensive, gets slower, or starts losing the thread — and the customer feels it.

Five Things a Manager Should Actually Do With This

You don’t need to manage tokens yourself. You need to ask better questions and make sharper decisions. Here’s the practical layer:

1Ask “which model are we using, and why?” If the answer is “the best one” with no further reasoning, you’ve found an easy cost saving. Match the model to the task — cheap models for simple, high-volume work; premium models only where the quality genuinely justifies it.
2Treat verbose AI as a cost problem, not just a style problem. Shorter, sharper outputs save real money at scale. Build “be concise” into your prompts and products.
3Watch the output-to-input ratio. A workflow that generates lots of text (reports, emails) is dominated by expensive output tokens. One that mostly reads and classifies is cheaper. Knowing which you’re running tells you where to optimize.
4Question the “dump everything in” approach. Retrieving the relevant slice of a document is usually cheaper and more accurate than feeding the model everything. Ask whether your team is sending only what’s needed.
5Measure cost per task, not just cost per month. A monthly bill tells you what you spent. Cost per task tells you whether your AI is a strategic investment or an accidental one. The first is a number; the second is a decision.

The Real Point

When I studied computer science earlier in my career, “tokens” were an abstract detail — something happening deep inside a system I’d never have to budget for. Now, working at the intersection of AI and business strategy — and after wrestling with a support agent that quietly racked up tokens with every conversational turn — I see them differently. Tokens are where the technology meets the P&L. They’re the unit where engineering choices quietly become financial outcomes.

You don’t need to know how a model splits a long word into pieces. But you do need to know that there’s a meter running, that it speaks in tokens, and that the difference between a thoughtful AI strategy and a runaway bill often comes down to whether someone in the room could read it.

The managers who’ll win with AI over the next few years aren’t the ones who can code. They’re the ones who understand just enough of how the machine charges — to ask the right question at the right moment.

It usually starts with one word: tokens.