Prompt length, output length and why AI bills surprise teams

The typical team does not think about token cost until the first invoice arrives. By then, the damage is already baked into product decisions: a verbose system prompt, retrieved chunks that get sent in full every time, output length limits set to “as much as possible,” and conversation history that grows without bound.

The short answer is: you pay for every token you send and every token the model sends back. Most overruns come from prompt bloat — system prompts, retrieved context and repeated instructions — not from the user’s question itself.

If you only remember one thing: the input side of a typical chat API call costs more than the output side for most workloads, because the prompt carries system instructions, retrieval context, conversation history and tool output before the user even asks anything.

TL;DR

LLM pricing is roughly: cost = (input tokens × input price per token) + (output tokens × output price per token).

Output tokens are usually 3-4× more expensive per token than input tokens. But if your prompt is 10,000 tokens and your answer is 500 tokens, the input cost still dominates the total.

Common cost multipliers include:

Retrieved chunks sent in full — each chunk token counts, even if the model only uses one sentence.
Conversation history — every previous turn is re-sent on every new turn.
Verbose system prompts — charged on every call, not just the first one.
Tool or function call traces — full input and output of each tool use included in the next call.
Long default output limits — models default to 1,024–4,096 output tokens; if you set no limit, you pay for whatever they generate.

Where the cost hides

The hidden cost of system prompts

A 2,000-token system prompt sent 10,000 times per month at $2.00/M input tokens costs $40/month before any user query is processed. That is the baseline — it disappears into the API dashboard as “chat completions” without a separate line item.

The hidden cost of retrieved context

RAG pipelines often retrieve the top 5 chunks at ~500 tokens each (2,500 tokens total) and append them to every user query. At 10,000 queries/month, that is 25 million input tokens. At $2.00/M, that is $50/month before user input or output.

The hidden cost of conversation history

A conversation that runs 10 turns at an average of 500 tokens per turn (user + model) means the 10th call carries ~4,500 tokens of history plus the current prompt. Over 1,000 such conversations, the history cost alone can double the monthly bill.

Worked example: 3 scenarios compared

Assume GPT-4.1 pricing: $2.00/M input, $8.00/M output, 10,000 monthly calls, 200-token average user input, 500-token average output.

Scenario A — Lean:

Prompt: 200 tokens (no system prompt, no retrieval)
Total: 200 × 10,000 × $2/M = $4 input + 500 × 10,000 × $8/M = $40 output = $44/month

Scenario B — Typical:

Prompt: 200 user + 1,000-token system prompt + 2,500-token retrieval = 3,700 tokens
Total: 3,700 × 10,000 × $2/M = $74 input + 500 × 10,000 × $8/M = $40 output = $114/month

Scenario C — Bloated:

Prompt: 3,700 (as above) + 4,500 tokens of history (average after 10 turns) = 8,200 tokens
Total: 8,200 × 10,000 × $2/M = $164 input + 500 × 10,000 × $8/M = $40 output = $204/month

The output cost is the same in all three scenarios. The input cost grows 41× between A and C.

Practical ways to control cost

Trim your system prompt. Measure it in tokens. If it is over 500 tokens, ask what can move to a developer demonstration or stay in the product docs.
Chunk your retrieval context. Do not send entire documents. Send the paragraph that answers the query.
Summarise conversation history. Instead of appending every previous turn, send a one-paragraph summary of what has been established.
Set a max_tokens limit. Know your output needs before each call instead of letting the model decide.
Use caching for repeated prompts. If the same system prompt and context appear across many calls, prompt caching can reduce cost.

Formula block

Monthly input cost = (average prompt tokens × calls per month × input price per 1M tokens) / 1,000,000

Monthly output cost = (average output tokens × calls per month × output price per 1M tokens) / 1,000,000

Total cost = input cost + output cost

That formula works for planning. Your actual bill will include cache hits, batch discounts and rate-limit-related retries.

What this page cannot tell you

This page cannot tell you your exact billing scenario. It cannot measure your average prompt length, tell you which of your output tokens are waste, or tell you whether your provider’s cache implementation will actually reduce your costs.

Methodology and sources

Check date: 2026-05-25

What was checked: OpenAI GPT-4.1 pricing page for current input/output rates; provider documentation on token billing conventions.

Worked-example assumptions: All examples use GPT-4.1 pricing. No caching or batch discounts applied. Average token counts are illustrative.

Assumptions and limits:

Pricing changes over time. Re-check before budgeting.
Different providers bill input and output at different rates.
Some providers include tool call outputs in the output token count.
Conversation history costs depend on truncation strategy and summarisation approach.
Jurisdiction: Global. Pricing examples use USD and reference OpenAI, Anthropic, Google, and Mistral — all major international providers.

Source list

OpenAI pricing page — https://openai.com/api/pricing/ (accessed 2026-05-25)
Anthropic pricing — https://www.anthropic.com/pricing (accessed 2026-05-25)
Google Gemini pricing — https://cloud.google.com/vertex-ai/generative-ai/pricing (accessed 2026-05-25)
Mistral pricing — https://mistral.ai/products/la-plateforme#pricing (accessed 2026-05-25)

Trust Stack

Last checked: 2026-05-25
Corrections: Contact us to report errors

Change log

2026-05-28: Full editorial review against 16-gate checklist: added third Editor’s Note, Trust Stack, jurisdiction label, source access dates, fixed frontmatter labels, and corrected Editor’s Note format from blockquote to aside cards.
2026-05-27: Added direct source URLs to all named providers and services; added Change Log section. Content unchanged.
2026-05-25: First published.