What is a token, and why does it affect AI cost?
A token is the chunk of text a model reads, predicts, and bills you for. It is not the same thing as a word, a character, or a sentence. That is why a short-looking prompt can still be expensive once it contains punctuation, code, JSON, long instructions, or a lot of retrieved context.
The safe answer is simple: count the real tokens in the real workload before you compare model prices. If you only count words, you will miss the part that actually lands on the invoice.
Quick answer
If you are choosing a model or trying to explain a surprise bill, start here: tokens are the billing unit and the context unit. More input tokens mean more text sent in. More output tokens mean more text generated. Both can move the price, and output often costs more per token than input.
The same paragraph can tokenise differently in different model families, so rules like “about 750 words per 1,000 tokens” are only rough guesses. They are fine for a back-of-envelope estimate; they are not fine for a budget.
Editor’s Note: Most token-cost surprises are really prompt-design surprises. A sprawling system prompt, repeated instructions, and copied retrieval chunks can quietly become the most expensive part of the workflow.
Editor’s Note: The old “1 token ≈ 4 characters” shortcut is too blunt for code, tables, JSON, URLs, and mixed punctuation. It is a rough average, not a billing rule.
Editor’s Note: When a bill looks wrong, the first question is often not “did the provider overcharge?” It is “did our process create more tokens than we thought?”
What a token is
A token is a piece of text that a model’s tokenizer has decided to treat as a unit.
In practice, that means:
- short common words may be one token;
- punctuation can split a token;
- numbers often split in surprising ways;
- code, URLs, and JSON usually break into more pieces than plain prose;
- different model families do not always split the same string the same way.
Tokens matter because the model only sees tokens, not human word counts. That is why a model bill is usually tied to token volume, not page length.
Key terms
- Token: a chunk of text used for model input, output, and billing.
- Tokenization: the process of splitting text into tokens.
- Input tokens: tokens you send to the model.
- Output tokens: tokens the model generates back to you.
- Context window: the maximum amount of tokenised text the model can consider at once.
- System prompt: the instruction block that shapes how the model behaves before the user message begins.
Why token counts change
The same sentence can produce different token counts depending on the tokenizer family.
Here is one short example:
| Sample paragraph | cl100k_base (OpenAI-style) | Mistral 7B Instruct v0.3 |
|---|---|---|
A 1,000-word note is not always about 1,000 tokens when it includes code, tables, quotes, URLs, or mixed punctuation. | 31 tokens | 39 tokens |
The difference is not random. In the Mistral tokenizer, the number 1,000 breaks into more pieces, and URLs also splits apart. In cl100k_base, the same string is shorter in token terms.
That is the point to remember: two teams can read the same paragraph and think they are budgeting for the same workload, while their tokenizers quietly disagree.
Where the extra tokens come from
Token counts usually rise when you add:
- long system instructions;
- retrieved chunks from search or RAG;
- code blocks, tables, or JSON;
- long quoted text;
- tool-call results and tool chatter;
- multi-turn history that gets repeated back into the next request.
If you want a rough test, try counting the prompt after you have added the things you actually send to the model, not the tidied version in your head.
Why tokens affect cost
Current Claude pricing on the provider’s pricing page, checked on 2026-05-22, shows the pattern clearly:
| Model | Input | Output | Prompt caching write | Prompt caching read |
|---|---|---|---|---|
| Opus 4.7 | $5 / MTok | $25 / MTok | $6.25 / MTok | $0.50 / MTok |
| Sonnet 4.6 | $3 / MTok | $15 / MTok | $3.75 / MTok | $0.30 / MTok |
| Haiku 4.5 | $1 / MTok | $5 / MTok | $1.25 / MTok | $0.10 / MTok |
The shape is the important bit:
- output usually costs more than input;
- cheaper models still charge by token, so long outputs still add up;
- prompt caching changes the economics only when the same text is reused enough to justify the write step;
- batch processing can halve token prices for work that does not need immediate responses.
Worked example: same tokens, two model choices
Assumptions:
- 8,000 input tokens;
- 2,000 output tokens;
- no caching;
- no batch discount;
- USD pricing from the 2026-05-22 Claude pricing page.
On Sonnet 4.6:
- input cost = 8,000 × $3 / 1,000,000 = $0.024
- output cost = 2,000 × $15 / 1,000,000 = $0.030
- total = $0.054
On Haiku 4.5:
- input cost = 8,000 × $1 / 1,000,000 = $0.008
- output cost = 2,000 × $5 / 1,000,000 = $0.010
- total = $0.018
The same token count costs three times more on Sonnet 4.6 than on Haiku 4.5 in this example. That does not mean Haiku is always the right answer. It means token count and model choice both matter, and the output side can move the bill faster than people expect.
Where teams get surprised
The usual surprise points are boring, which is exactly why they keep happening:
- A helpful-looking system prompt keeps growing.
- Retrieved documents get pasted in because they were “only a few chunks”.
- The model answers in more detail than the team planned for.
- A retry loop turns one call into three.
- A tool call returns a wall of JSON that gets sent back into the next turn.
If you are seeing a bigger bill than expected, the fastest win is often to trim the prompt and cap the output before you change models.
How to sense-check a provider bill
Use this checklist before you blame the pricing page:
- Count the real input tokens in the prompt you actually send.
- Split reusable instructions from one-off user text.
- Estimate output as a range, not a single number.
- Check whether caching is available and whether reuse is high enough to pay back.
- Check whether batch mode is allowed for the workload.
- Compare the full workload, not just the headline input rate.
Cost-planning formula
Estimated cost = input tokens × input rate + output tokens × output rate + cache adjustments + batch adjustment
That is a planning formula, not a promise. The real bill can still move if the prompt changes, the output gets longer, or the provider counts tokens differently than you expected.
What this page cannot tell you
This page cannot tell you your actual bill.
It cannot tell you:
- how long your real prompt is;
- how many tokens the model will emit;
- whether your account tier has a different rate;
- whether your workflow qualifies for batch pricing;
- whether prompt caching will pay back on your reuse pattern.
It can only show you the shape of the problem and the questions that matter before you compare vendors.
Global applicability
This article is global. There is no UK, GB, or Northern Ireland split to apply here.
The useful caution is the same everywhere: token counting is model-specific, pricing changes, and the published provider page is the thing to check before you budget or buy.
Methodology and sources
Check date: 2026-05-22
What was checked:
- OpenAI’s “How to count tokens with tiktoken” cookbook page for tokenizer concepts and the pricing reminder that token counts affect API cost.
- The
cl100k_basetokenizer through the currenttiktokenlibrary for the OpenAI-style token count in the example paragraph. - The
mistralai/Mistral-7B-Instruct-v0.3tokenizer for the second-family comparison. - Anthropic’s current pricing page for input, output, prompt caching, and batch pricing.
What the example uses:
- the same sample paragraph under two tokenizer families;
- no caching or batch discount in the worked example;
- USD pricing only;
- MTok means million tokens.
Assumptions and limits:
- token counts are model-family specific;
- provider billing can change;
- the worked cost example is illustrative, not a quote;
- output length is treated separately because it often changes the bill more than people expect.
Change log
- 2026-05-22: first draft built from the llm-editor-approved launch slice brief, with current tokenizer checks, current Anthropic pricing, a worked cost example, and reader-facing caveats.
Source list
- OpenAI Cookbook: How to count tokens with tiktoken — https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken
- Anthropic pricing page — https://www.anthropic.com/pricing
- Mistral tokenizer repository — https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3