🧮 Reasoning Mode

Claude Extended Thinking

How Claude's extended thinking mode works, how to enable it in the API, how to set the token budget, and which tasks benefit most.

What Is Extended Thinking?

Extended thinking gives Claude a private reasoning scratchpad — a block of tokens it uses to think through a problem before producing its final answer. The reasoning phase is separate from the response: Claude thinks first (you can see this as a thinking content block), then responds.

You control how much thinking budget to allocate via the budget_tokens parameter. Claude uses as many tokens as it needs up to that limit, then writes the final answer. Extended thinking is available on Claude Sonnet 3.7+ and Claude Opus 4.x.

Key distinction: Extended thinking is not the same as chain-of-thought prompting (where you ask Claude to show its work in the response). It's a native API feature where the reasoning happens in a separate, dedicated token budget, controlled by the API parameter — not prompt engineering.

How to Enable Extended Thinking

Python SDK

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 8000  # up to 32,000 max
    },
    messages=[{
        "role": "user",
        "content": "Design the data model and API schema for a multi-tenant SaaS billing system that handles seat-based, usage-based, and hybrid pricing. Include edge cases."
    }]
)

# Response has multiple content blocks
for block in response.content:
    if block.type == "thinking":
        print("=== REASONING ===")
        print(block.thinking)
    elif block.type == "text":
        print("=== RESPONSE ===")
        print(block.text)

TypeScript SDK

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 8000,
  },
  messages: [{
    role: "user",
    content: "Find the bug in this recursive algorithm and explain why it fails on edge cases..."
  }]
});

for (const block of response.content) {
  if (block.type === "thinking") {
    console.log("Reasoning:", block.thinking);
  } else if (block.type === "text") {
    console.log("Answer:", block.text);
  }
}

Setting the Right Budget

The budget_tokens parameter is a maximum, not a fixed allocation — Claude uses only what it needs. Setting it too low forces Claude to truncate its reasoning; setting it too high costs money for unused tokens.

Task typeRecommended budgetWhy
Simple debugging / 1-step problem1,000–2,000Quick verification, not deep search
Algorithm / data structure design4,000–8,000Needs to explore multiple approaches
Complex architecture decisions8,000–16,000Trade-off analysis across many dimensions
Hard math / competitive programming16,000–32,000Exhaustive reasoning needed
Routine code generationDon't use extended thinking
Note on max_tokens: max_tokens must be larger than budget_tokens. If you set budget_tokens: 8000, set max_tokens to at least 9,000 (or much higher if you also want a long response). The thinking tokens are drawn from the max_tokens allocation before the response is written.

When Extended Thinking Helps vs. Doesn't

✅ Worth enabling

  • Hard algorithmic problems (DP, graph traversal)
  • Debugging subtle concurrency bugs
  • Designing schemas with complex constraints
  • Security analysis with many attack vectors
  • Multi-step planning with dependencies
  • Math-heavy problems (proofs, optimizations)

❌ Overkill (use normal mode)

  • Writing docstrings or comments
  • Simple refactors (rename, extract function)
  • Generating boilerplate code
  • Answering factual API questions
  • Explaining well-understood concepts
  • Formatting / linting fixes

Cost of Extended Thinking

Thinking tokens are billed as output tokens — the most expensive type. At Claude Sonnet 4.x rates:

ScenarioTokensApprox. cost
Normal 500-word response~600 output~$0.009
+ 2,000-token thinking budget (used)+2,000 output+$0.030
+ 8,000-token thinking budget (used)+8,000 output+$0.120
+ 32,000-token thinking budget (used)+32,000 output+$0.480

For most hard engineering problems, a 4,000–8,000 token budget is the sweet spot: meaningful improvement in reasoning quality without extreme cost. Reserve 32K budgets for genuinely hard math or competitive programming problems.

For current output token prices: prompt-pricing.vercel.app · claude-cost-calc.vercel.app

Reading the Thinking Block

The thinking content block is Claude's internal monologue. It often includes:

This trace is valuable for debugging: if the final answer is wrong, reading the thinking block often reveals where the reasoning went astray — which you can then correct with a follow-up message.

Streaming: Extended thinking works with streaming (stream=True / .stream()). Thinking blocks stream before the text blocks. In some API configurations, thinking blocks are summarized rather than streamed verbatim — check the Anthropic docs for your SDK version.

Extended Thinking in Claude Code (CLI)

When using Claude Code as a CLI tool (not via API), extended thinking is managed automatically. Claude Code decides when to apply extended reasoning based on the complexity of your request — you don't set a budget manually. For the API/SDK use case, the thinking parameter gives you explicit control.

The sub-agents system also benefits from extended thinking: spawning a Plan sub-agent with extended thinking enabled produces more thorough architecture proposals before execution agents begin.

Frequently Asked Questions

What is Claude extended thinking?
Extended thinking is a mode where Claude gets a private reasoning scratchpad — tokens it uses to think through a problem before producing its final answer. The reasoning appears in a separate 'thinking' content block in the response. You set a token budget that caps how much Claude can think. Available on Claude Sonnet 3.7+ and Claude Opus 4.x.
How do I enable extended thinking in the Claude API?
Pass a thinking parameter with type: "enabled" and a budget_tokens value (1,024 minimum, 32,000 maximum). Make sure max_tokens is larger than budget_tokens. The response includes content blocks of type thinking (the reasoning) and text (the final answer).
When should I use extended thinking vs normal mode?
Extended thinking is worth enabling for hard algorithmic problems, complex architecture decisions, debugging subtle logic bugs, and multi-step planning with dependencies. For routine code generation, simple refactors, answering factual questions, or writing docs — normal mode is sufficient and significantly cheaper.
How are thinking tokens priced?
Thinking tokens are billed as output tokens — the most expensive type (~$15/M for Claude Sonnet). A 10,000-token thinking budget used fully costs $0.15 on top of normal output. Set the budget to the minimum needed for your task. Most hard coding problems are well-served by a 4,000–8,000 token budget.
Can I see Claude's reasoning trace?
Yes. The API response includes content blocks with type: "thinking" containing the full reasoning trace. You can read it to understand why Claude reached its conclusion — useful for debugging when the final answer is wrong. Note: thinking blocks cannot be injected back into subsequent requests as assistant turns.
Does extended thinking work in Claude Code (the CLI)?
Claude Code applies extended reasoning automatically for complex tasks. You don't set a budget manually in the CLI — Claude decides when deeper reasoning is warranted. For your own API/SDK applications, use the explicit thinking parameter to control it.

Explore More Claude Code Skills

⚡ Using Claude Code? 30 power prompts that 2× your output · £5 £3 first 10Get PDF £3 →