Diagram comparing a fast language model path with a slower reasoning model path that uses extra steps before answering

What is a reasoning model?

AI foundations, models and capabilities

A reasoning model is an AI model designed to spend extra computation on a problem before giving its final answer, especially for tasks that need planning, multi step analysis, coding, maths or tool use. In practice, the term usually refers to language models that can "think" for longer, use intermediate steps, and sometimes expose a summary or trace of that process. The term is still emerging, so there is no single formal definition shared by every provider.

Reviewed by Jackie, Head of Learning & Development, Levellers · Last reviewed 8 June 2026

What this means

Most large language models predict the next token in a sequence. A reasoning model still does that, but it is tuned to take a more deliberate path on harder problems. Instead of giving the quickest plausible reply, it may spend more time breaking the task down, checking alternatives, calling tools, or reconsidering an earlier step.

That is why reasoning models are often described as models that use extra "test time compute". In plain English, they spend more effort while answering. Some providers let you control that effort directly. Some offer a hybrid mode where the same model can answer quickly for simple requests or think longer for difficult ones. Some expose part of the reasoning process to developers. Others keep it mostly internal and only return the final answer or a short summary.

As of 2 June 2026, major model providers use slightly different language, but the family resemblance is clear. They describe models or modes that are better at complex problem solving because the system can allocate more inference time, more reasoning tokens, or a larger thinking budget before responding.

The key point for a leader is not the branding. It is that there is now a meaningful difference between models optimised for speed and models optimised for harder, more deliberate work. A reasoning model is usually slower and more expensive than a standard model response, but it can handle ambiguity, tool use and multi step tasks more reliably when the use case justifies it.

Why it matters

Reasoning models matter because many real business tasks are not one shot question answering. They involve judgement across several steps. A model may need to inspect evidence, choose a route, call the right tool, compare alternatives, apply a policy, or debug a chain of issues. Standard fast models can help, but they often fail when the task requires patience rather than fluency.

For leaders, that changes workflow design. The right answer is often not "use the smartest model everywhere". It is to reserve reasoning models for the parts of work where extra deliberation earns its keep. That might be coding, planning, exception handling, root cause analysis, or agentic orchestration, while simpler drafting or classification work stays on a faster and cheaper model.

Reasoning models also matter for governance. They can improve difficult tasks, but they introduce trade offs in latency, cost, observability and safety. A model that reasons for longer can consume more tokens, take longer to answer, and still be wrong. Procurement and operations therefore need to evaluate both capability and control, not just benchmark headlines.

How it works

A reasoning model works by giving itself more room to work through a problem before producing the final answer. There are several ways this is achieved.

One route is prompting. Early research showed that asking a model to produce intermediate steps, often called chain of thought prompting, could materially improve performance on complex reasoning tasks. That finding changed how people thought about language models. But prompting alone is not the whole story. Modern reasoning models are not simply ordinary models told to "think step by step". Many are trained or tuned so that multi step reasoning, verification and self correction emerge more naturally.

Another route is training signal. Research and vendor disclosures now point to approaches such as reinforcement learning, process supervision, and related techniques that reward better intermediate reasoning behaviour, not only correct final answers. The aim is to improve how a model searches through a problem, checks itself and adapts strategy when it gets stuck.

At inference time, reasoning models often expose some control over effort. A developer may choose low, medium or high reasoning effort, or set a thinking budget. That lets the same workflow trade speed and spend against deeper analysis. For an easy question, the model may use very little extra computation. For a hard coding task or planning problem, it may use many more internal steps before it writes the answer.

In some systems, this internal work appears as reasoning tokens or a visible thought process. In others, it is hidden and the user sees only the final answer or a provider generated summary. That difference matters. Visible reasoning can help debugging and evaluation, but it is not a proof of correctness. A model can produce an elegant explanation and still be wrong. Conversely, a model can solve a task without showing you every internal step.

There are now at least two practical design patterns in the market. One is the dedicated reasoning model, a model family built primarily for harder multi step work. The other is the hybrid model, where a single model can answer quickly in one mode and reason more deeply in another. For buyers, both patterns matter because they affect routing, controls and spend. A hybrid model can simplify architecture. A dedicated reasoning model can make routing clearer. Neither is automatically better.

Tool use is where reasoning models often show their value. If an AI assistant needs to search internal knowledge, call an API, inspect code, or run a calculation, it must decide what to do next, in what order, and when to stop. Reasoning models are generally better at this kind of sequencing than simple one pass generation. That is why they appear often in agent workflows. Not because they are magic, but because planning matters.

However, there are hard limits. More reasoning does not guarantee truth. A model can reason from incomplete evidence, invent a missing fact, or overfit to a pattern in benchmarks that does not transfer to your work. It can also overthink simple tasks, adding latency and cost without much gain. And some recent research argues that current reasoning models still show brittle behaviour as problem complexity rises, which is a useful reminder not to mistake improved benchmark performance for general reliability.

There is also a safety angle. If reasoning traces are visible, they may help with monitoring, but they can also contain raw or unsuitable material for end users. If they are hidden, monitoring becomes harder. Some providers now expose safe summaries or encrypted reasoning artefacts instead of raw internal chains. That is useful, but it reinforces a wider point: a reasoning model is not just a stronger model. It is also a different operational object.

The practical mechanics therefore come down to four moving parts. First, how the model was trained to deliberate. Second, how much extra compute it can use at response time. Third, whether and how its intermediate reasoning is exposed. Fourth, whether it can use tools and maintain state effectively across a multi step workflow. Those choices determine whether a reasoning model is merely interesting, or actually useful in production.

Examples

A support operations team uses an AI assistant for return exceptions. A fast model extracts customer details and policy clauses. A reasoning model then decides whether the case meets the policy, where the ambiguity lies, and whether a human reviewer should take over.

An engineering team uses a coding agent to inspect logs, read files, run tests and propose a patch. The value does not come only from code generation. It comes from the model's ability to choose the next step sensibly, keep track of what it has tried, and stop when a fix is not yet safe.

A finance team uses AI to investigate why a reconciliation failed. The reasoning model checks transaction groups, compares system timings, tests rival explanations and drafts a structured explanation for an analyst to verify.

A knowledge worker asks for a summary of a policy document and a list of obvious action points. That does not necessarily need a reasoning model. This is an important example too, because it shows where a faster model is often a better fit.

Common misunderstandings

One misunderstanding is that reasoning models "reason like humans". They do not. They are still statistical models. They may imitate a careful problem solving style, and that can be very useful, but it is not the same as human understanding.

Another is that they are always better. They are often better for complex tasks, but they are regularly worse on speed and cost, and can be unnecessary for straightforward drafting, extraction or transformation work.

A third mistake is to assume that visible reasoning equals safety or truth. A clear explanation can be persuasive and wrong. In some cases, showing raw reasoning can also create policy or trust problems. It should be treated as a diagnostic aid, not a guarantee.

Finally, people often think reasoning is only about maths. In production, some of the most valuable uses are planning, tool choice, debugging, policy application and managing ambiguity.

Risks and boundaries

Reasoning models bring real trade offs. They can consume more tokens, take longer to answer, and create more complex logs and traces. If cost or speed is critical, using them everywhere is often a mistake.

They also have quality boundaries. Some problems benefit from more deliberate search. Others fail because the model lacks the needed facts, cannot access the right system, or is operating under vague policy. Extra reasoning cannot fix missing evidence. It can sometimes make a weak answer longer rather than better.

There are monitoring and safety boundaries too. Recent work suggests reasoning traces can help detect misbehaviour, but also that heavy pressure on those traces may make models hide intent. At the same time, some evaluation research argues current reasoning models remain brittle on certain hard tasks. In high stakes use, they should still sit inside a workflow with testing, permission controls and human verification. This article is general information, not legal, safety or technical assurance advice.

What to do next

Begin by classifying your AI tasks by difficulty, not by department. Ask which tasks are simple generation, which need retrieval, and which need multi step planning or exception handling. Reasoning models earn their place in the third group, not everywhere.

Then test routing. In many organisations, the best design is a mixed stack. Use a fast model for straightforward work. Escalate to a reasoning model only when the task is ambiguous, tool heavy, or genuinely multi step. That keeps spend under control without flattening quality.

Next, evaluate like an operator. Measure not only final answer quality, but latency, token use, tool selection, consistency, and failure modes on tricky cases. If the model exposes reasoning traces or summaries, decide who can see them, how they are logged, and whether they create any additional risk.

Finally, keep the workflow humble. A reasoning model should have limited permissions, clear stopping rules, and an escalation path to a human. The right mental model is not "the model thinks for us". It is "the model can do a more careful first pass on work that benefits from deliberate analysis".

Have a question or a suggestion, or want to understand how we research and review these guides? Read about our editorial standards and how to reach us.

FAQs

What is the difference between a reasoning model and a normal LLM?

A reasoning model is usually tuned to spend more inference time on harder tasks, often using intermediate steps, tool planning or self checking before producing the answer. A standard model is usually optimised more heavily for speed and cost.

Do reasoning models always show their working?

No. Some providers expose raw reasoning, some provide summaries, and some mostly keep it internal. Even where steps are visible, they should not be treated as proof that the answer is correct.

Are reasoning models only useful for coding and maths?

No. They are also useful for planning, exception handling, multi document analysis, tool use, and cases where the model has to choose between several possible routes.

Why are reasoning models slower and often more expensive?

Because they usually use more computation and more tokens while solving a task. That extra effort can improve difficult cases, but it also adds latency and cost.

Should I replace all my existing model calls with a reasoning model?

Usually not. Most organisations get better results by routing only the hard or ambiguous parts of work to a reasoning model and leaving simpler tasks on faster models.

Can a smaller model still do reasoning?

Sometimes, yes. Distillation, prompting and task design can all help smaller models handle some reasoning tasks. But the more demanding the work, the more the limits of size, context and training usually matter.

Sources

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (arXiv). Secondary, academic. Supports the historical background that explicit intermediate steps can improve performance on complex reasoning tasks.

‹ What is model drift?

What are AI tokens? ›