What Is Extended Thinking?
Extended thinking gives Claude a private reasoning scratchpad — a block of tokens it uses to think through a problem before producing its final answer. The reasoning phase is separate from the response: Claude thinks first (you can see this as a thinking content block), then responds.
You control how much thinking budget to allocate via the budget_tokens parameter. Claude uses as many tokens as it needs up to that limit, then writes the final answer. Extended thinking is available on Claude Sonnet 3.7+ and Claude Opus 4.x.
How to Enable Extended Thinking
Python SDK
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 8000 # up to 32,000 max
},
messages=[{
"role": "user",
"content": "Design the data model and API schema for a multi-tenant SaaS billing system that handles seat-based, usage-based, and hybrid pricing. Include edge cases."
}]
)
# Response has multiple content blocks
for block in response.content:
if block.type == "thinking":
print("=== REASONING ===")
print(block.thinking)
elif block.type == "text":
print("=== RESPONSE ===")
print(block.text)
TypeScript SDK
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 8000,
},
messages: [{
role: "user",
content: "Find the bug in this recursive algorithm and explain why it fails on edge cases..."
}]
});
for (const block of response.content) {
if (block.type === "thinking") {
console.log("Reasoning:", block.thinking);
} else if (block.type === "text") {
console.log("Answer:", block.text);
}
}
Setting the Right Budget
The budget_tokens parameter is a maximum, not a fixed allocation — Claude uses only what it needs. Setting it too low forces Claude to truncate its reasoning; setting it too high costs money for unused tokens.
| Task type | Recommended budget | Why |
|---|---|---|
| Simple debugging / 1-step problem | 1,000–2,000 | Quick verification, not deep search |
| Algorithm / data structure design | 4,000–8,000 | Needs to explore multiple approaches |
| Complex architecture decisions | 8,000–16,000 | Trade-off analysis across many dimensions |
| Hard math / competitive programming | 16,000–32,000 | Exhaustive reasoning needed |
| Routine code generation | — | Don't use extended thinking |
max_tokens must be larger than budget_tokens. If you set budget_tokens: 8000, set max_tokens to at least 9,000 (or much higher if you also want a long response). The thinking tokens are drawn from the max_tokens allocation before the response is written.
When Extended Thinking Helps vs. Doesn't
✅ Worth enabling
- Hard algorithmic problems (DP, graph traversal)
- Debugging subtle concurrency bugs
- Designing schemas with complex constraints
- Security analysis with many attack vectors
- Multi-step planning with dependencies
- Math-heavy problems (proofs, optimizations)
❌ Overkill (use normal mode)
- Writing docstrings or comments
- Simple refactors (rename, extract function)
- Generating boilerplate code
- Answering factual API questions
- Explaining well-understood concepts
- Formatting / linting fixes
Cost of Extended Thinking
Thinking tokens are billed as output tokens — the most expensive type. At Claude Sonnet 4.x rates:
| Scenario | Tokens | Approx. cost |
|---|---|---|
| Normal 500-word response | ~600 output | ~$0.009 |
| + 2,000-token thinking budget (used) | +2,000 output | +$0.030 |
| + 8,000-token thinking budget (used) | +8,000 output | +$0.120 |
| + 32,000-token thinking budget (used) | +32,000 output | +$0.480 |
For most hard engineering problems, a 4,000–8,000 token budget is the sweet spot: meaningful improvement in reasoning quality without extreme cost. Reserve 32K budgets for genuinely hard math or competitive programming problems.
For current output token prices: prompt-pricing.vercel.app · claude-cost-calc.vercel.app
Reading the Thinking Block
The thinking content block is Claude's internal monologue. It often includes:
- Restating the problem to clarify what's being asked
- Identifying edge cases and failure modes upfront
- Exploring multiple approaches and rejecting weaker ones
- Working through specific examples mentally
- Catching its own mistakes mid-reasoning and correcting
This trace is valuable for debugging: if the final answer is wrong, reading the thinking block often reveals where the reasoning went astray — which you can then correct with a follow-up message.
stream=True / .stream()). Thinking blocks stream before the text blocks. In some API configurations, thinking blocks are summarized rather than streamed verbatim — check the Anthropic docs for your SDK version.
Extended Thinking in Claude Code (CLI)
When using Claude Code as a CLI tool (not via API), extended thinking is managed automatically. Claude Code decides when to apply extended reasoning based on the complexity of your request — you don't set a budget manually. For the API/SDK use case, the thinking parameter gives you explicit control.
The sub-agents system also benefits from extended thinking: spawning a Plan sub-agent with extended thinking enabled produces more thorough architecture proposals before execution agents begin.
Frequently Asked Questions
thinking parameter with type: "enabled" and a budget_tokens value (1,024 minimum, 32,000 maximum). Make sure max_tokens is larger than budget_tokens. The response includes content blocks of type thinking (the reasoning) and text (the final answer).type: "thinking" containing the full reasoning trace. You can read it to understand why Claude reached its conclusion — useful for debugging when the final answer is wrong. Note: thinking blocks cannot be injected back into subsequent requests as assistant turns.thinking parameter to control it.