Prompt caching explained: when repeated context becomes cheaper

TL;DR

Prompt caching allows LLM providers to reuse processed prefixes from previous requests, significantly reducing latency and computing costs for repeated context. By storing computed states, APIs can skip expensive computation on stable instruction blocks.

Pricing and usage by provider

…

Change log

2026-06-22: Applied editorial review fixes: updated description, ensured heading IDs, and added Change Log section.

OpenAI

*- Cache write: $1.25/M tokens (vs $2.00/M fresh input) |- Cache read: $0.3125/M tokens

Anthropic

*- Cache write: $1.25/M tokens (vs $3.00/M fresh input for Sonnet 4.6) |- Cache read: $0.30/M tokens

Google Gemini

*- Cache write: 50% of fresh input rate |- Cache read: 25% of fresh input rate

Assumptions and limits

|- Cache pricing changes over time. |- Cache duration limits vary by provider. |- Some models within each provider’s range do not support caching.

Source list

|- OpenAI — prompt caching documentation — https://platform.openai.com/docs/guides/prompt-caching (accessed 2026-05-28) |- Anthropic — prompt caching guide — https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching (accessed 2026-05-28) |- Google Gemini — caching API — https://ai.google.dev/gemini-api/docs/caching (accessed 2026-05-28)

Trust Stack

Last checked: 2026-05-28
Corrections: Contact us to report errors

|- Prompt length, output length and why AI bills surprise teams |- API model pricing: input, output, cache and batch costs |- Caching AI answers: when it is safe, risky or pointless |- Batch APIs for LLMs: cheaper, slower and often underused