Chapter 10
16 min read
Section 62 of 175

Chain-of-Thought Prompting

Planning and Reasoning

Introduction

When humans solve complex problems, we don't jump straight to answers—we think through steps, consider options, and reason our way to conclusions. Chain-of-Thought (CoT) promptingencourages language models to do the same: explicitly showing their reasoning process rather than producing answers directly.

This section explores how CoT dramatically improves reasoning quality, different techniques for eliciting chain-of-thought reasoning, and how to integrate CoT into agentic systems for more reliable planning and decision-making.

Core Insight: Making reasoning explicit doesn't just help us understand the model—it actually improves the model's reasoning quality. The act of "thinking out loud" enables better problem-solving.

What is Chain-of-Thought?

Chain-of-Thought prompting asks models to show intermediate reasoning steps before providing final answers. This simple technique can dramatically improve performance on reasoning tasks.

Standard vs. Chain-of-Thought

🐍python
1# Standard prompting - direct answer
2prompt_standard = """
3Q: A store has 45 apples. They sell 23 and receive a new shipment of 17.
4How many apples do they have?
5
6A:
7"""
8# Model might answer: "39" (correct, but how did it get there?)
9
10# Chain-of-Thought prompting - explicit reasoning
11prompt_cot = """
12Q: A store has 45 apples. They sell 23 and receive a new shipment of 17.
13How many apples do they have?
14
15A: Let me work through this step by step:
161. Starting apples: 45
172. After selling 23: 45 - 23 = 22 apples
183. After receiving shipment of 17: 22 + 17 = 39 apples
19
20The store has 39 apples.
21"""
22# Model shows reasoning, making it verifiable and more reliable

Why CoT Works

MechanismExplanationBenefit
DecompositionBreaks problem into smaller stepsEach step is easier to solve correctly
Intermediate statesCreates checkpoints in reasoningErrors are caught and corrected earlier
Explicit contextReasoning is visible to the modelCan reference earlier conclusions
Structured thinkingImposes logical orderReduces random jumps or omissions
Self-consistencySteps must alignContradictions become apparent

When to Use CoT

  • Multi-step reasoning: Math, logic, planning
  • Complex analysis: Comparing options, weighing trade-offs
  • Debugging: Tracing through code or logic
  • Decision-making: Evaluating criteria systematically
  • Explanation tasks: Teaching or documenting
CoT is most valuable for complex reasoning tasks. For simple factual retrieval or creative generation, it may add unnecessary overhead.

CoT Prompting Techniques

There are several techniques for eliciting chain-of-thought reasoning:

1. Zero-Shot CoT

Simply ask the model to think step by step without examples:

🐍python
1def zero_shot_cot(question: str) -> str:
2    """Elicit reasoning with a simple instruction."""
3
4    prompt = f"""{question}
5
6Let's think through this step by step:"""
7
8    response = llm.generate(prompt)
9    return response
10
11# Alternative trigger phrases:
12triggers = [
13    "Let's think step by step.",
14    "Let me work through this carefully.",
15    "Breaking this down:",
16    "Let's reason about this:",
17    "I'll solve this systematically:",
18]

2. Few-Shot CoT

Provide examples that demonstrate the reasoning format you want:

🐍python
1def few_shot_cot(question: str, examples: list[dict]) -> str:
2    """Elicit reasoning by demonstrating with examples."""
3
4    prompt_parts = []
5
6    for ex in examples:
7        prompt_parts.append(f"""Q: {ex['question']}
8
9A: Let me think through this:
10{ex['reasoning']}
11
12Therefore, the answer is: {ex['answer']}
13---""")
14
15    prompt_parts.append(f"""Q: {question}
16
17A: Let me think through this:""")
18
19    prompt = "\n\n".join(prompt_parts)
20    response = llm.generate(prompt)
21    return response
22
23# Example usage
24examples = [
25    {
26        "question": "If a train travels 60 mph for 2.5 hours, how far does it go?",
27        "reasoning": """
281. Distance = Speed × Time
292. Speed = 60 mph
303. Time = 2.5 hours
314. Distance = 60 × 2.5 = 150 miles""",
32        "answer": "150 miles"
33    },
34    {
35        "question": "A recipe calls for 3 cups of flour for 24 cookies. How much for 36 cookies?",
36        "reasoning": """
371. Find flour per cookie: 3 cups / 24 cookies = 0.125 cups/cookie
382. For 36 cookies: 0.125 × 36 = 4.5 cups""",
39        "answer": "4.5 cups of flour"
40    }
41]

3. Self-Consistency

Generate multiple reasoning chains and aggregate results:

🐍python
1import asyncio
2from collections import Counter
3
4async def self_consistent_cot(
5    question: str,
6    num_samples: int = 5,
7    temperature: float = 0.7
8) -> dict:
9    """Generate multiple reasoning chains and find consensus."""
10
11    prompt = f"""{question}
12
13Let me think through this step by step:"""
14
15    # Generate multiple samples
16    tasks = [
17        llm.generate_async(prompt, temperature=temperature)
18        for _ in range(num_samples)
19    ]
20
21    responses = await asyncio.gather(*tasks)
22
23    # Extract final answers from each chain
24    answers = []
25    reasoning_chains = []
26
27    for response in responses:
28        answer = extract_final_answer(response)
29        answers.append(answer)
30        reasoning_chains.append(response)
31
32    # Find most common answer
33    answer_counts = Counter(answers)
34    most_common = answer_counts.most_common(1)[0]
35
36    return {
37        "answer": most_common[0],
38        "confidence": most_common[1] / num_samples,
39        "all_answers": answer_counts,
40        "reasoning_chains": reasoning_chains
41    }
42
43def extract_final_answer(response: str) -> str:
44    """Extract the final answer from a reasoning chain."""
45    # Look for patterns like "the answer is", "therefore", etc.
46    patterns = [
47        r"the answer is[:s]+(.+)",
48        r"therefore[,:s]+(.+)",
49        r"so[,:s]+(.+)$",
50        r"= (d+)",
51    ]
52    # ... implement pattern matching
53    return response.split("\n")[-1]  # Simple fallback

4. Least-to-Most Prompting

Decompose the problem explicitly, then solve subproblems in order:

🐍python
1async def least_to_most(question: str) -> str:
2    """Decompose into subproblems, solve in order."""
3
4    # Step 1: Decompose
5    decompose_prompt = f"""Break this question into simpler subquestions that,
6when answered in order, will lead to the final answer.
7
8Question: {question}
9
10List the subquestions in order:"""
11
12    subquestions = await llm.generate_async(decompose_prompt)
13    subq_list = parse_numbered_list(subquestions)
14
15    # Step 2: Solve each subquestion
16    context = []
17    for sq in subq_list:
18        solve_prompt = f"""Given what we know:
19{chr(10).join(context) if context else 'Starting fresh.'}
20
21Answer this subquestion: {sq}
22
23Think step by step:"""
24
25        answer = await llm.generate_async(solve_prompt)
26        context.append(f"Q: {sq}\nA: {answer}")
27
28    # Step 3: Synthesize final answer
29    final_prompt = f"""Based on the following subquestion answers:
30
31{chr(10).join(context)}
32
33Now answer the original question: {question}"""
34
35    final_answer = await llm.generate_async(final_prompt)
36    return final_answer
Self-consistency is particularly powerful for math and logic problems where there's a single correct answer. For open-ended questions, it may surface multiple valid perspectives.

Structured Reasoning Formats

Beyond free-form reasoning, we can impose structure on the chain-of-thought to make it more systematic:

Numbered Steps

🐍python
1NUMBERED_STEPS_TEMPLATE = """
2Solve this problem using numbered steps:
3
4Problem: {problem}
5
6Solution:
7Step 1: [First step]
8Step 2: [Second step]
9...
10Step N: [Final step with answer]
11
12Final Answer: [Clear statement of answer]
13"""
14
15def numbered_steps_cot(problem: str) -> str:
16    prompt = NUMBERED_STEPS_TEMPLATE.format(problem=problem)
17    return llm.generate(prompt)

Pros and Cons Analysis

🐍python
1PROS_CONS_TEMPLATE = """
2Analyze this decision:
3
4Decision: {decision}
5
6## Option A: {option_a}
7Pros:
8- [List advantages]
9
10Cons:
11- [List disadvantages]
12
13## Option B: {option_b}
14Pros:
15- [List advantages]
16
17Cons:
18- [List disadvantages]
19
20## Analysis
21[Compare options systematically]
22
23## Recommendation
24[Final recommendation with justification]
25"""
26
27def pros_cons_analysis(decision: str, options: list[str]) -> str:
28    prompt = PROS_CONS_TEMPLATE.format(
29        decision=decision,
30        option_a=options[0],
31        option_b=options[1]
32    )
33    return llm.generate(prompt)

Hypothesis Testing

🐍python
1HYPOTHESIS_TEMPLATE = """
2Investigate this question:
3
4Question: {question}
5
6## Hypothesis 1
7Statement: [Proposed answer]
8Evidence For: [Supporting facts]
9Evidence Against: [Contradicting facts]
10Likelihood: [High/Medium/Low]
11
12## Hypothesis 2
13Statement: [Alternative answer]
14Evidence For: [Supporting facts]
15Evidence Against: [Contradicting facts]
16Likelihood: [High/Medium/Low]
17
18## Conclusion
19[Most likely answer with reasoning]
20"""
21
22def hypothesis_testing(question: str) -> str:
23    prompt = HYPOTHESIS_TEMPLATE.format(question=question)
24    return llm.generate(prompt)

Decision Matrix

🐍python
1DECISION_MATRIX_TEMPLATE = """
2Evaluate these options against criteria:
3
4Decision: {decision}
5Options: {options}
6Criteria: {criteria}
7
8## Scoring Matrix
9| Criterion | Weight | {option_headers} |
10|-----------|--------|{option_columns}|
11{criteria_rows}
12
13## Weighted Scores
14{score_calculations}
15
16## Recommendation
17[Best option with explanation]
18"""
19
20def decision_matrix(
21    decision: str,
22    options: list[str],
23    criteria: list[dict]  # [{"name": "...", "weight": 0.3}, ...]
24) -> str:
25    # Build the matrix structure
26    option_headers = " | ".join(options)
27    option_columns = " | ".join(["------"] * len(options))
28
29    prompt = f"""Create a decision matrix to evaluate:
30
31Decision: {decision}
32Options: {", ".join(options)}
33Criteria (with weights): {criteria}
34
35For each criterion, score each option from 1-10.
36Then calculate weighted scores and recommend the best option.
37
38Think through each score carefully with brief justification."""
39
40    return llm.generate(prompt)

CoT in Agentic Systems

Chain-of-thought reasoning is particularly valuable in agentic systems. Here's how to integrate CoT into agent workflows:

ReAct with Explicit Reasoning

🐍python
1class ReActWithCoT:
2    """ReAct agent with enhanced chain-of-thought."""
3
4    def __init__(self, tools: list):
5        self.tools = tools
6        self.client = Anthropic()
7
8    async def step(self, task: str, history: list[dict]) -> dict:
9        """Single ReAct step with explicit reasoning."""
10
11        history_text = self._format_history(history)
12
13        prompt = f"""You are solving a task step by step.
14
15Task: {task}
16
17Previous steps:
18{history_text}
19
20Available tools: {[t.name for t in self.tools]}
21
22Now reason about the next step:
23
24## Current Understanding
25[What do I know so far? What have I learned from previous steps?]
26
27## Goal Analysis
28[What am I trying to achieve? What's the gap between current state and goal?]
29
30## Options Consideration
31[What are my options for the next action? What are the trade-offs?]
32
33## Decision
34Thought: [Explicit reasoning about what to do next]
35Action: [tool_name]
36Action Input: [input for the tool]
37
38OR if the task is complete:
39
40## Decision
41Thought: [Reasoning about why the task is complete]
42Final Answer: [The complete answer]"""
43
44        response = self.client.messages.create(
45            model="claude-sonnet-4-20250514",
46            max_tokens=2048,
47            messages=[{"role": "user", "content": prompt}]
48        )
49
50        return self._parse_response(response.content[0].text)
51
52    def _format_history(self, history: list[dict]) -> str:
53        """Format previous steps for context."""
54        if not history:
55            return "No previous steps yet."
56
57        lines = []
58        for i, step in enumerate(history):
59            lines.append(f"Step {i+1}:")
60            lines.append(f"  Thought: {step.get('thought', 'N/A')}")
61            if 'action' in step:
62                lines.append(f"  Action: {step['action']}")
63                lines.append(f"  Result: {step.get('result', 'N/A')}")
64        return "\n".join(lines)

Planning with CoT

🐍python
1async def plan_with_cot(goal: str, context: dict) -> list[dict]:
2    """Create a plan using chain-of-thought reasoning."""
3
4    prompt = f"""Create a plan to achieve this goal.
5
6Goal: {goal}
7Context: {json.dumps(context)}
8
9## Understanding the Goal
10[What exactly needs to be accomplished? What does success look like?]
11
12## Analyzing Constraints
13[What limitations or requirements must be respected?]
14
15## Identifying Key Challenges
16[What are the main obstacles or complexities?]
17
18## Exploring Approaches
19[What are different ways to achieve this goal?]
20
21## Selecting Approach
22[Which approach is best and why?]
23
24## Detailed Plan
25[Step-by-step plan with rationale for each step]
26
27Return the plan as a JSON array:
28[
29    {{"step": 1, "action": "...", "rationale": "...", "dependencies": []}},
30    ...
31]"""
32
33    response = await llm.generate_async(prompt)
34
35    # Extract JSON from response
36    plan = extract_json_from_reasoning(response)
37    return plan

Tool Selection with CoT

🐍python
1async def select_tool_with_cot(
2    task: str,
3    available_tools: list[dict],
4    context: str
5) -> dict:
6    """Select the best tool using explicit reasoning."""
7
8    tools_desc = "\n".join(
9        f"- {t['name']}: {t['description']}"
10        for t in available_tools
11    )
12
13    prompt = f"""Select the best tool for this task.
14
15Task: {task}
16Context: {context}
17
18Available tools:
19{tools_desc}
20
21## Task Analysis
22[What does this task require? What kind of operation is needed?]
23
24## Tool Evaluation
25[For each potentially relevant tool, assess:
26- Does it have the required capability?
27- What are the limitations?
28- How well does it fit?]
29
30## Selection
31Tool: [chosen tool name]
32Rationale: [why this tool is the best choice]
33Input: [what input to provide]
34Expected Output: [what we expect to get back]"""
35
36    response = await llm.generate_async(prompt)
37
38    # Parse selection
39    return parse_tool_selection(response)
Verbose CoT adds tokens and latency. Use it for critical decisions, not every operation. Consider a tiered approach: brief reasoning for simple decisions, full CoT for complex ones.

Implementing CoT Reasoning

Let's build a complete CoT reasoning system for agents:

🐍python
1from anthropic import Anthropic
2from dataclasses import dataclass, field
3from typing import Any, Optional
4from enum import Enum
5import json
6import re
7
8class ReasoningStyle(Enum):
9    BRIEF = "brief"        # Quick reasoning for simple tasks
10    STANDARD = "standard"  # Normal step-by-step
11    THOROUGH = "thorough"  # Deep analysis
12    SOCRATIC = "socratic"  # Self-questioning
13
14@dataclass
15class ReasoningStep:
16    """A single step in the reasoning chain."""
17    step_number: int
18    thought: str
19    conclusion: Optional[str] = None
20    confidence: float = 1.0
21
22@dataclass
23class ReasoningChain:
24    """Complete chain of reasoning."""
25    question: str
26    steps: list[ReasoningStep]
27    final_answer: str
28    total_confidence: float
29    style_used: ReasoningStyle
30
31class ChainOfThoughtReasoner:
32    """
33    Implements various chain-of-thought reasoning strategies.
34    """
35
36    def __init__(self, model: str = "claude-sonnet-4-20250514"):
37        self.client = Anthropic()
38        self.model = model
39
40    def reason(
41        self,
42        question: str,
43        style: ReasoningStyle = ReasoningStyle.STANDARD,
44        context: str = ""
45    ) -> ReasoningChain:
46        """Perform chain-of-thought reasoning."""
47
48        if style == ReasoningStyle.BRIEF:
49            return self._brief_reasoning(question, context)
50        elif style == ReasoningStyle.STANDARD:
51            return self._standard_reasoning(question, context)
52        elif style == ReasoningStyle.THOROUGH:
53            return self._thorough_reasoning(question, context)
54        elif style == ReasoningStyle.SOCRATIC:
55            return self._socratic_reasoning(question, context)
56
57    def _brief_reasoning(
58        self,
59        question: str,
60        context: str
61    ) -> ReasoningChain:
62        """Quick reasoning for simple questions."""
63
64        prompt = f"""Answer this question with brief reasoning.
65
66{f"Context: {context}" if context else ""}
67Question: {question}
68
69Think: [1-2 sentences of reasoning]
70Answer: [concise answer]"""
71
72        response = self.client.messages.create(
73            model=self.model,
74            max_tokens=512,
75            messages=[{"role": "user", "content": prompt}]
76        )
77
78        text = response.content[0].text
79        thought, answer = self._parse_brief(text)
80
81        return ReasoningChain(
82            question=question,
83            steps=[ReasoningStep(1, thought, answer)],
84            final_answer=answer,
85            total_confidence=0.9,
86            style_used=ReasoningStyle.BRIEF
87        )
88
89    def _standard_reasoning(
90        self,
91        question: str,
92        context: str
93    ) -> ReasoningChain:
94        """Standard step-by-step reasoning."""
95
96        prompt = f"""Answer this question with step-by-step reasoning.
97
98{f"Context: {context}" if context else ""}
99Question: {question}
100
101Let me think through this step by step:
102
103Step 1: [First consideration]
104Step 2: [Second consideration]
105...
106
107Therefore, the answer is: [final answer]"""
108
109        response = self.client.messages.create(
110            model=self.model,
111            max_tokens=2048,
112            messages=[{"role": "user", "content": prompt}]
113        )
114
115        text = response.content[0].text
116        steps, answer = self._parse_standard(text)
117
118        return ReasoningChain(
119            question=question,
120            steps=steps,
121            final_answer=answer,
122            total_confidence=0.85,
123            style_used=ReasoningStyle.STANDARD
124        )
125
126    def _thorough_reasoning(
127        self,
128        question: str,
129        context: str
130    ) -> ReasoningChain:
131        """Deep, comprehensive reasoning."""
132
133        prompt = f"""Perform thorough analysis of this question.
134
135{f"Context: {context}" if context else ""}
136Question: {question}
137
138## Understanding
139[What is being asked? Any ambiguities to resolve?]
140
141## Relevant Knowledge
142[What facts, principles, or frameworks apply?]
143
144## Analysis
145[Systematic examination of the problem]
146
147## Considerations
148[Alternative perspectives, edge cases, limitations]
149
150## Synthesis
151[Bringing together insights]
152
153## Conclusion
154[Final answer with confidence level]"""
155
156        response = self.client.messages.create(
157            model=self.model,
158            max_tokens=4096,
159            messages=[{"role": "user", "content": prompt}]
160        )
161
162        text = response.content[0].text
163        steps, answer = self._parse_thorough(text)
164
165        return ReasoningChain(
166            question=question,
167            steps=steps,
168            final_answer=answer,
169            total_confidence=0.9,
170            style_used=ReasoningStyle.THOROUGH
171        )
172
173    def _socratic_reasoning(
174        self,
175        question: str,
176        context: str
177    ) -> ReasoningChain:
178        """Reasoning through self-questioning."""
179
180        prompt = f"""Answer by asking and answering questions.
181
182{f"Context: {context}" if context else ""}
183Main Question: {question}
184
185Q1: [First clarifying question]
186A1: [Answer to Q1]
187
188Q2: [Follow-up question based on A1]
189A2: [Answer to Q2]
190
191Q3: [Deeper question]
192A3: [Answer to Q3]
193
194...
195
196Final Answer: [Answer to the main question]"""
197
198        response = self.client.messages.create(
199            model=self.model,
200            max_tokens=2048,
201            messages=[{"role": "user", "content": prompt}]
202        )
203
204        text = response.content[0].text
205        steps, answer = self._parse_socratic(text)
206
207        return ReasoningChain(
208            question=question,
209            steps=steps,
210            final_answer=answer,
211            total_confidence=0.88,
212            style_used=ReasoningStyle.SOCRATIC
213        )
214
215    def _parse_brief(self, text: str) -> tuple[str, str]:
216        """Parse brief reasoning format."""
217        thought_match = re.search(r"Think:s*(.+?)(?=Answer:|$)", text, re.S)
218        answer_match = re.search(r"Answer:s*(.+?)$", text, re.S)
219
220        thought = thought_match.group(1).strip() if thought_match else text
221        answer = answer_match.group(1).strip() if answer_match else text
222
223        return thought, answer
224
225    def _parse_standard(self, text: str) -> tuple[list[ReasoningStep], str]:
226        """Parse standard step-by-step format."""
227        steps = []
228        step_pattern = r"Step (d+):s*(.+?)(?=Step d+:|Therefore|$)"
229        matches = re.findall(step_pattern, text, re.S)
230
231        for num, content in matches:
232            steps.append(ReasoningStep(
233                step_number=int(num),
234                thought=content.strip()
235            ))
236
237        answer_match = re.search(r"(?:Therefore|Thus|So),?s*(?:the answer is:?)?s*(.+?)$", text, re.S)
238        answer = answer_match.group(1).strip() if answer_match else ""
239
240        return steps, answer
241
242    def _parse_thorough(self, text: str) -> tuple[list[ReasoningStep], str]:
243        """Parse thorough analysis format."""
244        sections = ["Understanding", "Relevant Knowledge", "Analysis",
245                    "Considerations", "Synthesis", "Conclusion"]
246        steps = []
247
248        for i, section in enumerate(sections):
249            pattern = rf"## {section}
250(.+?)(?=## |$)"
251            match = re.search(pattern, text, re.S)
252            if match:
253                steps.append(ReasoningStep(
254                    step_number=i + 1,
255                    thought=match.group(1).strip()
256                ))
257
258        answer = steps[-1].thought if steps else text
259        return steps, answer
260
261    def _parse_socratic(self, text: str) -> tuple[list[ReasoningStep], str]:
262        """Parse Socratic Q&A format."""
263        steps = []
264        qa_pattern = r"Q(d+):s*(.+?)
265Ad+:s*(.+?)(?=Qd+:|Final|$)"
266        matches = re.findall(qa_pattern, text, re.S)
267
268        for num, question, answer in matches:
269            steps.append(ReasoningStep(
270                step_number=int(num),
271                thought=f"Q: {question.strip()}\nA: {answer.strip()}"
272            ))
273
274        answer_match = re.search(r"Final Answer:s*(.+?)$", text, re.S)
275        answer = answer_match.group(1).strip() if answer_match else ""
276
277        return steps, answer

Usage Example

🐍python
1# Create reasoner
2reasoner = ChainOfThoughtReasoner()
3
4# Brief reasoning for simple question
5brief_result = reasoner.reason(
6    "What's 15% of 80?",
7    style=ReasoningStyle.BRIEF
8)
9print(f"Brief: {brief_result.final_answer}")
10
11# Standard reasoning for moderate question
12standard_result = reasoner.reason(
13    "Should I use SQL or NoSQL for a social media app?",
14    style=ReasoningStyle.STANDARD,
15    context="Expected 1M users, need real-time feeds"
16)
17print(f"\nStandard reasoning ({len(standard_result.steps)} steps):")
18for step in standard_result.steps:
19    print(f"  Step {step.step_number}: {step.thought[:100]}...")
20print(f"Answer: {standard_result.final_answer}")
21
22# Thorough reasoning for complex question
23thorough_result = reasoner.reason(
24    "What's the best architecture for a multi-tenant SaaS platform?",
25    style=ReasoningStyle.THOROUGH,
26    context="Enterprise customers, strict data isolation requirements"
27)
28print(f"\nThorough analysis ({len(thorough_result.steps)} sections):")
29print(f"Confidence: {thorough_result.total_confidence:.1%}")

Summary

Chain-of-Thought prompting makes reasoning explicit, improving both quality and interpretability. We covered:

  • What is CoT: Showing intermediate reasoning steps rather than jumping to answers
  • Prompting techniques: Zero-shot, few-shot, self-consistency, and least-to-most approaches
  • Structured formats: Numbered steps, pros/cons, hypothesis testing, and decision matrices
  • Agentic integration: Using CoT in ReAct loops, planning, and tool selection
  • Implementation: A flexible reasoner supporting multiple styles from brief to thorough

In the next section, we'll explore Tree of Thoughts—extending chain-of-thought to explore multiple reasoning paths in parallel.