What is a token (and why does it affect your AI budget)?

If you’ve ever looked at an LLM API invoice and wondered why a relatively short prompt cost significantly more than expected, you aren’t alone. The answer lies in a fundamental unit of measurement called the token.

In the world of Large Language Models (LLMs), tokens are the “currency” of computation. Unlike humans who read words or characters, AI models process text as chunks of varying lengths. Whether you are an engineer optimizing for latency or a founder managing a monthly cloud budget, understanding how these chunks work—and why they fluctuate—is critical to preventing cost surprises.

What is a token?

At its simplest, a token is a piece of a word. While it is common to hear the shorthand “1,000 tokens $\approx$ 750 words,” this is a dangerous approximation for precision work.

Tokenization is the process of breaking raw text into these manageable units. A single token might be:

A whole word (e.g., apple)
A part of a complex word (e.g., tokenization → token + ization)
Punctuation or whitespace (e.g., , or \n)
Individual characters in a string of code or emojis

The “Hidden” Variance: Why 750 words is a lie

The number of tokens generated for the exact same string varies depending on the model’s specific tokenizer architecture. This variance is where many teams lose track of their actual usage.

| Text Fragment | OpenAI (cl100k_base) | Anthropic (Claude-style) | Why it differs | | :---s| :---:| :---: | : | | The quick brown fox. | 5 tokens | 5 tokens | Common English structure. | | JSON: {"id": 123} | ~8 tokens | ~10 tokens | Spacing and symbols (:, {, ") are split differently across BPE/SentencePiece models. | | 🚀✨ | 1-2 tokens | 3-4 tokens | Emoji tokenization varies wildly by vocabulary size. |

Why token counts affect cost

AI providers generally do not charge by the character or the word; they charge per 1,000 or 1 million tokens. Crucially, there is almost always a price difference between Input Tokens (the prompt you send) and Output Tokens (the response the model generates). Output tokens are typically more expensive because they require more active computational work (autoregressive generation).

The Three Cost Drivers

Prompt Volume: Larger system prompts, RAG (Retrie ever Retrieval-Augmented Generation) chunks, and tool-calling histories increase your input token count.
Output Verbosity: Asking a model to “be detailed” or “write a long essay” directly inflates the number of output tokens generated, hitting your wallet harder than any other variable.
The “Context Window” Tax: As you approach the limit of a model’s context window, each new message in a conversation includes the entire history of that chat—meaning every turn effectively becomes more expensive than the last.

Where teams get surprised (and how to avoid it)

Cost spikes rarely come from simple text queries. They come from “structural” tokens:

The JSON Trap: If you use JSON mode or structured outputs, the structural overhead (braces, quotes, keys) adds significant token weight that isn’t present in plain prose.
Code and Markdown: Indentation ( ) and complex markdown tables are composed of many individual tokens for whitespace and delimiters.
The RAG Expansion: If your retrieval system pulls in 5 large documents to answer one question, you might be paying for 20,000 input tokens just to get a 100-token answer.

How to sense-check a provider bill

When reviewing an invoice or usage dashboard, don’t just look at the total price. Check these three fields:

Prompt Tokens (Input): Are these much higher than expected? You might have “context leakage” where old data isn’t being cleared.
Completion Tokens (Output): Is this spiking? Your models might be getting too “chatty.”
Cache Hits/Writes: If using providers like Anthropic or OpenAI with prompt caching, look for how much usage is attributed to cache hits—this should significantly lower your input costs.

A Worked Example: The “Summary” Task

Assumptions:

Task: Summarizing a 2,000-word article.
Input (Article + Instructions): ~3,000 tokens.
Output (The Summary): ~500 tokens.
Model: GPT-4o class ($5.00 per 1M input / $15.00 per 1M output).

Calculation:

Input Cost: $(3,000 / 1,000,000) \times $5.00 = $0.015$
Output Cost: $(500 / 1,000,000) \times $15.00 = $0.0075$
Total Cost per Summary: $$0.0225$

If you run this summary task 100,000 times a month, your bill is $$2,250$. A failure to optimize the input (e.g., by accidentally including old chat history) could easily double that.

Methodology and Sources

Data Checked: June 21, 2026
Sources: OpenAI API Pricing Documentation, Anthropic Claude API Pricing Guide, Open-source Tokenizer comparisons.

Change log

v1.0: Initial draft based on editorial brief.