Question 1

What is Claude extended thinking?

Accepted Answer

Extended thinking is a mode where Claude gets a private reasoning scratchpad — a block of tokens it uses to think through a problem before producing its final answer. The thinking is shown in a separate 'thinking' content block in the API response. You set a 'thinking budget' (a maximum number of tokens for the reasoning phase). Claude uses as many as it needs up to that budget, then writes the final response. Extended thinking is available on Claude Sonnet 3.7+ and Claude Opus 4.x models.

Question 2

How do I enable extended thinking in the Claude API?

Accepted Answer

Pass a 'thinking' parameter with type 'enabled' and a 'budget_tokens' value when calling the API. The minimum budget is 1,024 tokens; the maximum is 32,000 tokens (as of 2026). Example: messages.create({ model: 'claude-sonnet-4-6', max_tokens: 16000, thinking: { type: 'enabled', budget_tokens: 10000 }, messages: [...] }). The response will include one or more content blocks with type 'thinking' containing the reasoning trace, followed by the text response.

Question 3

When should I use extended thinking vs normal mode?

Accepted Answer

Extended thinking is worth enabling for: (1) Hard math or algorithmic problems where step-by-step reasoning matters. (2) Complex multi-step planning tasks where getting the order of operations right is critical. (3) Debugging subtle logic errors where Claude needs to trace execution mentally. (4) Architecture decisions that require weighing many trade-offs. Normal mode is sufficient for: routine code generation, answering factual questions, simple refactors, explaining known concepts, writing documentation. The cost premium (thinking tokens billed as output tokens) makes it overkill for everyday tasks.

Question 4

How are thinking tokens priced?

Accepted Answer

Thinking tokens are billed as output tokens — the most expensive token type. At Claude Sonnet rates (~$15/M output tokens), a 10,000-token thinking budget that's fully used adds $0.15 per request on top of the normal output cost. For comparison, a normal response of 500 tokens costs $0.0075. So extended thinking with a large budget can be 10-20× more expensive per request than normal mode. Set the budget to the minimum you need for your task complexity, not the maximum. A 2,000–4,000 token budget handles most hard coding problems without the full 32K expense.

Question 5

Does extended thinking work in Claude Code (the CLI)?

Accepted Answer

Claude Code (the CLI/agentic tool) uses extended thinking automatically for certain complex tasks when operating in appropriate modes. You don't need to enable it manually in the CLI — Claude decides when extended reasoning is warranted based on task complexity. For direct API usage in your own applications, you control it explicitly via the 'thinking' parameter. The Anthropic SDK for Python and TypeScript both support the thinking parameter natively.

Question 6

Can I see Claude's reasoning trace?

Accepted Answer

Yes. When extended thinking is enabled, the API response includes content blocks with type 'thinking' that contain the full reasoning trace. This is Claude's internal monologue as it works through the problem — you can read it to understand why it reached its conclusion. Note: thinking blocks cannot be injected back into subsequent requests as assistant turns (they're output-only). Also, in some streaming configurations, thinking blocks may be summarized rather than shown verbatim, depending on API version and configuration.

Task type	Recommended budget	Why
Simple debugging / 1-step problem	1,000–2,000	Quick verification, not deep search
Algorithm / data structure design	4,000–8,000	Needs to explore multiple approaches
Complex architecture decisions	8,000–16,000	Trade-off analysis across many dimensions
Hard math / competitive programming	16,000–32,000	Exhaustive reasoning needed
Routine code generation	—	Don't use extended thinking

Scenario	Tokens	Approx. cost
Normal 500-word response	~600 output	~$0.009
+ 2,000-token thinking budget (used)	+2,000 output	+$0.030
+ 8,000-token thinking budget (used)	+8,000 output	+$0.120
+ 32,000-token thinking budget (used)	+32,000 output	+$0.480

Claude Extended Thinking

What Is Extended Thinking?

How to Enable Extended Thinking

Python SDK

TypeScript SDK

Setting the Right Budget

When Extended Thinking Helps vs. Doesn't

✅ Worth enabling

❌ Overkill (use normal mode)

Cost of Extended Thinking

Reading the Thinking Block

Extended Thinking in Claude Code (CLI)

Frequently Asked Questions

Explore More Claude Code Skills