Reasoning models: what “thinking” modes change for cost and latency

Several major providers now offer reasoning models — models that show their chain of thought before giving a final answer, or that spend more computation on harder questions.

The capability improvement is real for tasks that benefit from deliberation. But the costs are also real: reasoning models are slower, more expensive per call, and not always better.

TL;DR

Reasoning models help most on tasks that require multi-step logic, precise mathematics, complex instruction following, or hard search over a solution space. They help least on tasks that are straightforward generation, translation, summarisation, or retrieval — where additional computation mainly adds latency without improving output quality.

The practical decision is simple: use fast models for routine tasks and route complex ones to reasoning models. The hard part is defining the routing rule.

What reasoning models actually do

Standard language models generate the most likely next token and move on. Reasoning models insert an intermediate phase where the model generates internal steps — sometimes visible, sometimes not — and uses those steps to guide the final answer.

The key differences:

Chain-of-thought prompting asks the model to “think step by step” within the normal generation budget. It works for many tasks and costs nothing extra, but the reasoning can be shallow.

Extended thinking modes (OpenAI o-series, DeepSeek R1, Gemini thinking mode) allocate additional computation to reasoning. The model may generate hundreds or thousands of internal reasoning tokens before producing a visible answer. This costs more per call and takes longer, but can produce significantly better results on hard problems.

Visible vs hidden reasoning. Some providers show the reasoning chain to the user; others keep it hidden. Visible reasoning helps with debugging and trust but can also leak intermediate assumptions.

Where reasoning models add value

The research and deployment evidence points to several categories where extended reasoning reliably improves outcomes:

Mathematics and logic. Problems that require multiple steps of deduction, equation solving, or constraint satisfaction.
Code generation and debugging. Complex programming tasks where the model needs to plan, test and iterate internally.
Multi-step tool use. Tasks where the model must decide which tool to call, with what arguments, in what order, and how to handle intermediate results.
Content with strict formatting or structural rules. Outputs that must match a complex schema, follow a specific document structure, or respect nested constraints.
Hard classification or decision boundaries. When the difference between correct and incorrect depends on subtle distinctions the model would normally gloss over.

Where reasoning models add cost without benefit

For many everyday tasks, reasoning models are overkill:

Translation. Adding reasoning tokens to a translation task rarely improves quality and always increases latency.
Summarisation of straightforward content. A well-prompted standard model summarises as well as a reasoning model at a fraction of the cost.
Simple classification. “Is this email spam?” does not benefit from a 500-token internal debate.
General chat and creative writing. The extra deliberation can make outputs feel stilted or over-thought.
Retrieval tasks. If the answer is in the provided context, the model does not need to reason about it — it needs to extract it.

The cost and latency reality

Extended thinking increases per-call costs by 3-10x compared to a standard model from the same provider. Output token counts can multiply, and because thinking tokens are usually billed at the same rate as output tokens, the cost compounds.

Latency also increases significantly. A reasoning model may take 10-60 seconds on a hard question that a standard model answers in 2-3 seconds. For synchronous applications, that latency is often unacceptable.

How to route between modes

The most practical approach is not to choose a single model. It is to route tasks:

Start with a fast, cheap model for all requests.
Use a fallback rule — based on task type, user role, input length, or confidence score — to escalate to a reasoning model when needed.
Measure the escalation rate and adjust the rule over time.

Some providers now offer automatic routing: the service decides whether extended reasoning is worth the extra cost for each request. These systems are improving but are not yet transparent enough to trust without your own monitoring.

Practical decision check

Are your tasks helped by multi-step reasoning or not?
Have you measured the latency and cost difference on your actual workload?
Can you route hard tasks to a reasoning model while keeping routine tasks on a standard model?
Do you have a way to measure whether the reasoning model is actually improving outcomes?

If the answer to the first question is no, reasoning models are likely not worth the premium.

Methodology

Data checked: 2026-05-24
Sources consulted: Provider reasoning-mode documentation and pricing for OpenAI o-series, DeepSeek R1, Gemini thinking mode, and Anthropic extended thinking; published evaluations and benchmark results; developer reports on real-world usage patterns
Assumptions: Reasoning model implementations and pricing change frequently. The category boundaries reflect patterns observed as of mid-2026; specific routing thresholds require workload-specific testing.
Limitations: This guide covers reasoning modes from the four named providers only. It does not cover open-weight reasoning models deployed locally or through third-party hosts. Provider implementations differ in detail; always consult current documentation for your specific model version.
Jurisdiction: Global. Reasoning model availability and pricing may vary by region.

Source list

OpenAI reasoning models — https://platform.openai.com/docs/guides/reasoning (accessed 2026-05-24)
DeepSeek R1 technical report — https://arxiv.org/abs/2501.12948 (accessed 2026-05-24)
Google Gemini thinking — https://ai.google.dev/gemini-api/docs/thinking-mode (accessed 2026-05-24)
Anthropic extended thinking documentation — https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking (accessed 2026-05-24)
Aider leaderboard (coding benchmark with reasoning models) — https://aider.chat/docs/leaderboards/ (accessed 2026-05-24)

Trust Stack

Last checked: 2026-05-28
Corrections: Contact us to report errors

Change log

2026-05-28: Editorial review against 16-gate checklist. Fixed frontmatter (writtenBy), added 3 Editor’s Note cards, restructured Methodology section, added Trust Stack, added slugified heading IDs to all H2s, removed internal process reference from Change Log.
2026-05-24: First published. Plain-English decision framework for reasoning model adoption.