Prompt length, output length and why AI bills surprise teams
The typical team does not think about token cost until the first invoice arrives. By then, the damage is already baked into product decisions: a verbose system prompt, retrieved chunks that get sent in full every time, output length limits set to “as much as possible,” and conversation history that grows without bound.
The short answer is: you pay for every token you send and every token the model sends back. Most overruns come from prompt bloat — system prompts, retrieved context and repeated instructions — not from the user’s question itself.
If you only remember one thing: the input side of a typical chat API call costs more than the output side for most workloads, because the prompt carries system instructions, retrieval context, conversation history and tool output before the user even asks anything.
Editor’s Note: Teams often celebrate “we output 8,000 tokens!” without noticing the prompt was 40,000 tokens. The output is the visible cost. The input is the silent one.
Editor’s Note: A verbose system prompt charged on every request is like running a water tap 24/7 and being surprised by the bill.
Quick answer {#quick-answer}
LLM pricing is roughly: cost = (input tokens × input price per token) + (output tokens × output price per token).
Output tokens are usually 3-4× more expensive per token than input tokens. But if your prompt is 10,000 tokens and your answer is 500 tokens, the input cost still dominates the total.
Common cost multipliers include:
- Retrieved chunks sent in full — each chunk token counts, even if the model only uses one sentence.
- Conversation history — every previous turn is re-sent on every new turn.
- Verbose system prompts — charged on every call, not just the first one.
- Tool or function call traces — full input and output of each tool use included in the next call.
- Long default output limits — models default to 1,024–4,096 output tokens; if you set no limit, you pay for whatever they generate.
Where the cost hides {#where-the-cost-hides}
The hidden cost of system prompts {#the-hidden-cost-of-system-prompts}
A 2,000-token system prompt sent 10,000 times per month at $2.00/M input tokens costs $40/month before any user query is processed. That is the baseline — it disappears into the API dashboard as “chat completions” without a separate line item.
The hidden cost of retrieved context {#the-hidden-cost-of-retrieved-context}
RAG pipelines often retrieve the top 5 chunks at ~500 tokens each (2,500 tokens total) and append them to every user query. At 10,000 queries/month, that is 25 million input tokens. At $2.00/M, that is $50/month before user input or output.
The hidden cost of conversation history {#the-hidden-cost-of-conversation-history}
A conversation that runs 10 turns at an average of 500 tokens per turn (user + model) means the 10th call carries ~4,500 tokens of history plus the current prompt. Over 1,000 such conversations, the history cost alone can double the monthly bill.
Worked example: 3 scenarios compared {#worked-example-3-scenarios-compared}
Assume GPT-4.1 pricing: $2.00/M input, $8.00/M output, 10,000 monthly calls, 200-token average user input, 500-token average output.
Scenario A — Lean:
- Prompt: 200 tokens (no system prompt, no retrieval)
- Total: 200 × 10,000 × $2/M = $4 input + 500 × 10,000 × $8/M = $40 output = $44/month
Scenario B — Typical:
- Prompt: 200 user + 1,000-token system prompt + 2,500-token retrieval = 3,700 tokens
- Total: 3,700 × 10,000 × $2/M = $74 input + 500 × 10,000 × $8/M = $40 output = $114/month
Scenario C — Bloated:
- Prompt: 3,700 (as above) + 4,500 tokens of history (average after 10 turns) = 8,200 tokens
- Total: 8,200 × 10,000 × $2/M = $164 input + 500 × 10,000 × $8/M = $40 output = $204/month
The output cost is the same in all three scenarios. The input cost grows 41× between A and C.
Practical ways to control cost {#practical-ways-to-control-cost}
- Trim your system prompt. Measure it in tokens. If it is over 500 tokens, ask what can move to a developer demonstration or stay in the product docs.
- Chunk your retrieval context. Do not send entire documents. Send the paragraph that answers the query.
- Summarise conversation history. Instead of appending every previous turn, send a one-paragraph summary of what has been established.
- Set a max_tokens limit. Know your output needs before each call instead of letting the model decide.
- Use caching for repeated prompts. If the same system prompt and context appear across many calls, prompt caching can reduce cost.
Formula block {#formula-block}
Monthly input cost = (average prompt tokens × calls per month × input price per 1M tokens) / 1,000,000
Monthly output cost = (average output tokens × calls per month × output price per 1M tokens) / 1,000,000
Total cost = input cost + output cost
That formula works for planning. Your actual bill will include cache hits, batch discounts and rate-limit-related retries.
What this page cannot tell you {#what-this-page-cannot-tell-you}
This page cannot tell you your exact billing scenario. It cannot measure your average prompt length, tell you which of your output tokens are waste, or tell you whether your provider’s cache implementation will actually reduce your costs.
Methodology and sources {#methodology-and-sources}
Check date: 2026-05-25
What was checked: OpenAI GPT-4.1 pricing page for current input/output rates; provider documentation on token billing conventions.
Worked-example assumptions: All examples use GPT-4.1 pricing. No caching or batch discounts applied. Average token counts are illustrative.
Assumptions and limits:
- Pricing changes over time. Re-check before budgeting.
- Different providers bill input and output at different rates.
- Some providers include tool call outputs in the output token count.
- Conversation history costs depend on truncation strategy and summarisation approach.
Source list {#source-list}
- OpenAI pricing page — https://openai.com/api/pricing/
- Anthropic pricing — https://www.anthropic.com/pricing
- Google Gemini pricing — https://cloud.google.com/vertex-ai/generative-ai/pricing
- Mistral pricing — https://mistral.ai/products/la-plateforme#pricing
Related guides {#related-guides}
- What is a token, and why does it affect AI cost?
- Prompt caching explained: when repeated context becomes cheaper
- API model pricing: input, output, cache and batch costs
- The hidden cost of retries, fallbacks and validation loops
Change Log
- 2026-05-27: Added direct source URLs to all named providers and services; added Change Log section. Content unchanged.