Chapter 10
18 min read
Section 61 of 175

Self-Reflection and Correction

Planning and Reasoning

Introduction

Human experts don't just produce work—they review it, question their assumptions, and refine their outputs. This metacognitive ability—self-reflection—is crucial for quality work. For AI agents, building in similar reflection capabilities can dramatically improve output quality and reliability.

In this section, we'll explore how agents can evaluate their own reasoning, detect errors in their outputs, and iteratively improve their work through self-correction loops.

Core Insight: Reflection transforms agents from single-shot generators into iterative refiners. The ability to critique and improve one's own work is what separates adequate from excellent outputs.

Why Reflection Matters

Without reflection, agents produce outputs in a single pass. This works for simple tasks but fails for complex ones where the first attempt is rarely optimal.

Benefits of Self-Reflection

BenefitDescriptionExample
Error DetectionCatch mistakes before user sees themFinding logical flaws in code
Quality ImprovementIteratively refine outputsImproving clarity of explanations
Consistency CheckingEnsure outputs align with requirementsVerifying all specs are addressed
Confidence CalibrationIdentify uncertain or weak areasFlagging assumptions that need validation
Learning OpportunityExtract insights for future tasksNoting patterns that worked well

The Reflection Gap

Research shows that LLMs can often identify errors in outputs they themselves generated—if explicitly asked to review. This "reflection gap" means the model has latent evaluation capability that goes unused without explicit prompting.

🐍python
1# Without reflection - single pass
2def generate_code(task: str) -> str:
3    response = llm.generate(f"Write code for: {task}")
4    return response  # May contain bugs
5
6# With reflection - self-review
7def generate_code_with_reflection(task: str) -> str:
8    # Generate initial code
9    code = llm.generate(f"Write code for: {task}")
10
11    # Reflect on the code
12    review = llm.generate(f"""
13    Review this code for bugs, edge cases, and improvements:
14
15    {code}
16
17    List any issues found.
18    """)
19
20    # If issues found, regenerate
21    if "no issues" not in review.lower():
22        code = llm.generate(f"""
23        Fix the issues in this code:
24
25        Original code:
26        {code}
27
28        Issues found:
29        {review}
30
31        Provide corrected code.
32        """)
33
34    return code

Types of Self-Reflection

Different types of reflection serve different purposes. Effective agents use multiple reflection strategies:

1. Output Verification

Check if the output meets the stated requirements:

🐍python
1async def verify_output(
2    task: str,
3    output: str,
4    requirements: list[str]
5) -> dict:
6    """Verify output against requirements."""
7
8    prompt = f"""Verify if this output meets all requirements.
9
10Task: {task}
11
12Output:
13{output}
14
15Requirements:
16{chr(10).join(f"- {r}" for r in requirements)}
17
18For each requirement, indicate:
19- MET: Requirement is fully satisfied
20- PARTIAL: Partially satisfied, needs improvement
21- NOT_MET: Not addressed or wrong
22
23Return JSON:
24{{
25    "verification": [
26        {{"requirement": "...", "status": "MET|PARTIAL|NOT_MET", "reason": "..."}}
27    ],
28    "overall_pass": true/false,
29    "critical_issues": ["..."]
30}}"""
31
32    response = await llm.generate(prompt)
33    return json.loads(response)

2. Reasoning Trace Review

Examine the reasoning process itself, not just the output:

🐍python
1async def review_reasoning(
2    question: str,
3    reasoning_trace: str,
4    conclusion: str
5) -> dict:
6    """Review the quality of reasoning."""
7
8    prompt = f"""Review this reasoning process for logical validity.
9
10Question: {question}
11
12Reasoning:
13{reasoning_trace}
14
15Conclusion: {conclusion}
16
17Check for:
181. Logical fallacies (non sequiturs, false dichotomies, etc.)
192. Unsupported assumptions
203. Missing steps in the argument
214. Contradictions
225. Valid but weak arguments
23
24Return JSON:
25{{
26    "logic_valid": true/false,
27    "issues": [
28        {{"type": "fallacy|assumption|gap|contradiction|weak",
29         "description": "...",
30         "location": "where in reasoning"}}
31    ],
32    "confidence_in_conclusion": 0.0-1.0,
33    "suggested_improvements": ["..."]
34}}"""
35
36    response = await llm.generate(prompt)
37    return json.loads(response)

3. Consistency Checking

Ensure different parts of the output are consistent with each other:

🐍python
1async def check_consistency(
2    outputs: dict[str, str]
3) -> dict:
4    """Check consistency across multiple outputs."""
5
6    prompt = f"""Check these outputs for internal consistency.
7
8Outputs:
9{json.dumps(outputs, indent=2)}
10
11Look for:
121. Contradictory statements
132. Conflicting numbers or facts
143. Inconsistent terminology
154. Timeline inconsistencies
16
17Return JSON:
18{{
19    "consistent": true/false,
20    "conflicts": [
21        {{"outputs": ["output1", "output2"],
22         "conflict": "description of conflict",
23         "suggested_resolution": "..."}}
24    ]
25}}"""
26
27    response = await llm.generate(prompt)
28    return json.loads(response)

4. Confidence Assessment

Evaluate how confident the agent should be in its output:

🐍python
1async def assess_confidence(
2    task: str,
3    output: str,
4    context: str = ""
5) -> dict:
6    """Assess confidence in the output."""
7
8    prompt = f"""Assess confidence in this output.
9
10Task: {task}
11Context: {context}
12Output: {output}
13
14Consider:
151. How much of this is factual vs. inference?
162. What assumptions were made?
173. What information would increase confidence?
184. What could make this wrong?
19
20Return JSON:
21{{
22    "overall_confidence": 0.0-1.0,
23    "confidence_breakdown": {{
24        "factual_accuracy": 0.0-1.0,
25        "logical_validity": 0.0-1.0,
26        "completeness": 0.0-1.0
27    }},
28    "assumptions": ["..."],
29    "uncertainty_sources": ["..."],
30    "would_help": ["what additional info would help"]
31}}"""
32
33    response = await llm.generate(prompt)
34    return json.loads(response)
Different tasks need different reflection types. Code benefits from verification and consistency checks; analysis benefits from reasoning review and confidence assessment.

Critic and Verifier Patterns

A powerful pattern is separating the generator (produces output) from the critic (evaluates output). This creates productive tension that improves quality.

Generator-Critic Loop

🐍python
1from dataclasses import dataclass
2from typing import Optional
3
4@dataclass
5class CriticFeedback:
6    """Feedback from the critic."""
7    approved: bool
8    issues: list[str]
9    suggestions: list[str]
10    severity: str  # "minor", "moderate", "major"
11
12@dataclass
13class GeneratorOutput:
14    """Output from the generator."""
15    content: str
16    iteration: int
17    changes_made: list[str]
18
19class GeneratorCriticLoop:
20    """
21    Iterative refinement through generator-critic interaction.
22    """
23
24    def __init__(
25        self,
26        max_iterations: int = 5,
27        approval_threshold: float = 0.8
28    ):
29        self.max_iterations = max_iterations
30        self.approval_threshold = approval_threshold
31        self.client = Anthropic()
32
33    async def generate(self, task: str) -> str:
34        """Initial generation."""
35        response = self.client.messages.create(
36            model="claude-sonnet-4-20250514",
37            max_tokens=4096,
38            messages=[{"role": "user", "content": f"Complete this task:\n{task}"}]
39        )
40        return response.content[0].text
41
42    async def critique(
43        self,
44        task: str,
45        output: str,
46        history: list[CriticFeedback] = None
47    ) -> CriticFeedback:
48        """Critique the output."""
49
50        history_context = ""
51        if history:
52            history_context = f"""
53Previous feedback that was already addressed:
54{chr(10).join(f"- {fb.issues}" for fb in history)}
55
56Do not repeat these issues if they were fixed.
57"""
58
59        prompt = f"""Critically evaluate this output for the given task.
60
61Task: {task}
62{history_context}
63Output to evaluate:
64{output}
65
66Be thorough but fair. Look for:
671. Correctness - Is it factually/logically right?
682. Completeness - Does it fully address the task?
693. Quality - Is it well-structured and clear?
704. Edge cases - Are edge cases handled?
71
72Return JSON:
73{{
74    "approved": true/false,
75    "quality_score": 0.0-1.0,
76    "issues": ["list of specific issues"],
77    "suggestions": ["concrete improvement suggestions"],
78    "severity": "minor|moderate|major"
79}}"""
80
81        response = self.client.messages.create(
82            model="claude-sonnet-4-20250514",
83            max_tokens=2048,
84            messages=[{"role": "user", "content": prompt}]
85        )
86
87        result = json.loads(response.content[0].text)
88        return CriticFeedback(
89            approved=result["approved"],
90            issues=result["issues"],
91            suggestions=result["suggestions"],
92            severity=result["severity"]
93        )
94
95    async def refine(
96        self,
97        task: str,
98        output: str,
99        feedback: CriticFeedback
100    ) -> str:
101        """Refine output based on feedback."""
102
103        prompt = f"""Improve this output based on feedback.
104
105Original Task: {task}
106
107Current Output:
108{output}
109
110Issues to Fix:
111{chr(10).join(f"- {issue}" for issue in feedback.issues)}
112
113Suggestions:
114{chr(10).join(f"- {sug}" for sug in feedback.suggestions)}
115
116Provide an improved version that addresses all issues."""
117
118        response = self.client.messages.create(
119            model="claude-sonnet-4-20250514",
120            max_tokens=4096,
121            messages=[{"role": "user", "content": prompt}]
122        )
123
124        return response.content[0].text
125
126    async def run(self, task: str) -> tuple[str, list[CriticFeedback]]:
127        """Run the full generator-critic loop."""
128
129        output = await self.generate(task)
130        feedback_history = []
131
132        for iteration in range(self.max_iterations):
133            # Get critique
134            feedback = await self.critique(task, output, feedback_history)
135            feedback_history.append(feedback)
136
137            # Check if approved
138            if feedback.approved:
139                print(f"Approved after {iteration + 1} iterations")
140                break
141
142            # Check if issues are minor
143            if feedback.severity == "minor":
144                print(f"Minor issues remaining, stopping at iteration {iteration + 1}")
145                break
146
147            # Refine
148            output = await self.refine(task, output, feedback)
149
150        return output, feedback_history

Multi-Critic Ensemble

Use multiple critics with different focuses for more thorough review:

🐍python
1class CriticEnsemble:
2    """Multiple specialized critics for comprehensive review."""
3
4    def __init__(self):
5        self.critics = {
6            "correctness": self._correctness_critic,
7            "clarity": self._clarity_critic,
8            "completeness": self._completeness_critic,
9            "style": self._style_critic,
10        }
11
12    async def _correctness_critic(self, task: str, output: str) -> dict:
13        """Focus on factual and logical correctness."""
14        prompt = f"""As a correctness reviewer, check this output.
15
16Task: {task}
17Output: {output}
18
19Focus ONLY on:
20- Factual accuracy
21- Logical validity
22- Technical correctness
23
24Ignore style, formatting, or clarity issues.
25
26Return JSON: {{"issues": [...], "severity": "none|minor|major"}}"""
27        # ... call LLM
28
29    async def _clarity_critic(self, task: str, output: str) -> dict:
30        """Focus on clarity and understandability."""
31        prompt = f"""As a clarity reviewer, check this output.
32
33Task: {task}
34Output: {output}
35
36Focus ONLY on:
37- Clear explanations
38- Logical flow
39- Appropriate detail level
40
41Ignore correctness (assume it's correct).
42
43Return JSON: {{"issues": [...], "severity": "none|minor|major"}}"""
44        # ... call LLM
45
46    async def _completeness_critic(self, task: str, output: str) -> dict:
47        """Focus on completeness."""
48        # Similar pattern...
49
50    async def _style_critic(self, task: str, output: str) -> dict:
51        """Focus on style and formatting."""
52        # Similar pattern...
53
54    async def full_review(self, task: str, output: str) -> dict:
55        """Run all critics and aggregate feedback."""
56
57        reviews = {}
58        for name, critic in self.critics.items():
59            reviews[name] = await critic(task, output)
60
61        # Aggregate
62        all_issues = []
63        max_severity = "none"
64        severity_order = {"none": 0, "minor": 1, "major": 2}
65
66        for name, review in reviews.items():
67            for issue in review["issues"]:
68                all_issues.append({"critic": name, "issue": issue})
69
70            if severity_order[review["severity"]] > severity_order[max_severity]:
71                max_severity = review["severity"]
72
73        return {
74            "reviews": reviews,
75            "all_issues": all_issues,
76            "overall_severity": max_severity,
77            "approved": max_severity == "none"
78        }

Self-Correction Strategies

When reflection reveals problems, agents need strategies to correct them. Here are key approaches:

1. Targeted Fixes

Address specific identified issues without regenerating everything:

🐍python
1async def targeted_fix(
2    output: str,
3    issues: list[dict]
4) -> str:
5    """Apply targeted fixes to specific issues."""
6
7    fixes_prompt = """Fix only the specific issues listed below.
8Preserve everything else exactly as is.
9
10Current output:
11{output}
12
13Issues to fix:
14{issues}
15
16Return the corrected output with minimal changes."""
17
18    formatted_issues = "\n".join(
19        f"- {i['location']}: {i['issue']}"
20        for i in issues
21    )
22
23    response = await llm.generate(
24        fixes_prompt.format(output=output, issues=formatted_issues)
25    )
26
27    return response

2. Regeneration with Constraints

When targeted fixes aren't enough, regenerate with explicit constraints:

🐍python
1async def constrained_regeneration(
2    task: str,
3    failed_output: str,
4    constraints: list[str]
5) -> str:
6    """Regenerate with explicit constraints from failed attempt."""
7
8    prompt = f"""Complete this task while following these constraints.
9
10Task: {task}
11
12CONSTRAINTS (must follow):
13{chr(10).join(f"- {c}" for c in constraints)}
14
15Previous attempt had these problems (avoid them):
16{failed_output[:500]}...
17
18Generate a new solution that satisfies all constraints."""
19
20    response = await llm.generate(prompt)
21    return response

3. Decompose and Fix

Break the problem into parts, fix each independently:

🐍python
1async def decompose_and_fix(
2    output: str,
3    issues: list[dict]
4) -> str:
5    """Decompose output into sections, fix each."""
6
7    # Group issues by section
8    sections = segment_output(output)  # Returns {section_id: text}
9    issues_by_section = group_issues_by_section(issues)
10
11    fixed_sections = {}
12
13    for section_id, section_text in sections.items():
14        section_issues = issues_by_section.get(section_id, [])
15
16        if not section_issues:
17            # No issues, keep as is
18            fixed_sections[section_id] = section_text
19        else:
20            # Fix this section
21            fixed = await targeted_fix(section_text, section_issues)
22            fixed_sections[section_id] = fixed
23
24    # Reassemble
25    return reassemble_sections(fixed_sections)

4. Backtracking

When current approach is fundamentally flawed, backtrack to an earlier decision point:

🐍python
1class BacktrackingCorrector:
2    """Maintain history for backtracking when needed."""
3
4    def __init__(self):
5        self.history: list[dict] = []
6
7    def checkpoint(self, state: dict, decision: str) -> None:
8        """Save a checkpoint."""
9        self.history.append({
10            "state": state.copy(),
11            "decision": decision,
12            "timestamp": time.time()
13        })
14
15    def backtrack(self, steps: int = 1) -> dict:
16        """Backtrack to a previous state."""
17        if steps > len(self.history):
18            raise ValueError("Cannot backtrack that far")
19
20        # Remove recent history
21        for _ in range(steps):
22            self.history.pop()
23
24        # Return the last checkpoint state
25        return self.history[-1]["state"] if self.history else {}
26
27    def find_alternative(self, failed_decision: str) -> str:
28        """Find alternative to a failed decision."""
29        # Use LLM to suggest alternative approach
30        prompt = f"""The decision "{failed_decision}" led to problems.
31
32History of decisions:
33{[h["decision"] for h in self.history]}
34
35Suggest an alternative approach that avoids this problem."""
36
37        return llm.generate(prompt)
Be careful with correction loops—they can oscillate or get stuck. Always have a maximum iteration limit and track if the same issues recur.

Building Reflective Agents

Let's build a complete reflective agent that integrates the patterns we've discussed:

🐍python
1from anthropic import Anthropic
2from dataclasses import dataclass, field
3from typing import Any, Optional, Callable
4from enum import Enum
5import json
6
7class ReflectionType(Enum):
8    VERIFICATION = "verification"
9    REASONING = "reasoning"
10    CONSISTENCY = "consistency"
11    CONFIDENCE = "confidence"
12
13@dataclass
14class ReflectionResult:
15    """Result of a reflection pass."""
16    reflection_type: ReflectionType
17    passed: bool
18    score: float
19    issues: list[str]
20    suggestions: list[str]
21    metadata: dict = field(default_factory=dict)
22
23@dataclass
24class AgentOutput:
25    """Agent output with reflection metadata."""
26    content: str
27    iterations: int
28    reflections: list[ReflectionResult]
29    final_confidence: float
30    changes_log: list[str]
31
32class ReflectiveAgent:
33    """
34    Agent with built-in self-reflection and correction.
35
36    Uses multiple reflection types and iterative refinement
37    to produce high-quality outputs.
38    """
39
40    def __init__(
41        self,
42        model: str = "claude-sonnet-4-20250514",
43        max_iterations: int = 3,
44        min_confidence: float = 0.8
45    ):
46        self.client = Anthropic()
47        self.model = model
48        self.max_iterations = max_iterations
49        self.min_confidence = min_confidence
50
51    async def execute(
52        self,
53        task: str,
54        requirements: list[str] = None,
55        reflection_types: list[ReflectionType] = None
56    ) -> AgentOutput:
57        """Execute task with reflection."""
58
59        if reflection_types is None:
60            reflection_types = [
61                ReflectionType.VERIFICATION,
62                ReflectionType.CONFIDENCE
63            ]
64
65        # Initial generation
66        output = await self._generate(task)
67        iterations = 1
68        all_reflections = []
69        changes_log = ["Initial generation"]
70
71        while iterations < self.max_iterations:
72            # Run reflections
73            reflections = await self._reflect(
74                task,
75                output,
76                requirements or [],
77                reflection_types
78            )
79            all_reflections.extend(reflections)
80
81            # Check if all reflections pass
82            all_pass = all(r.passed for r in reflections)
83            avg_score = sum(r.score for r in reflections) / len(reflections)
84
85            if all_pass and avg_score >= self.min_confidence:
86                break
87
88            # Collect issues and refine
89            all_issues = []
90            all_suggestions = []
91            for r in reflections:
92                all_issues.extend(r.issues)
93                all_suggestions.extend(r.suggestions)
94
95            if not all_issues:
96                break  # No specific issues to fix
97
98            # Refine
99            output = await self._refine(task, output, all_issues, all_suggestions)
100            iterations += 1
101            changes_log.append(f"Iteration {iterations}: Fixed {len(all_issues)} issues")
102
103        # Final confidence assessment
104        final_conf = await self._assess_final_confidence(task, output)
105
106        return AgentOutput(
107            content=output,
108            iterations=iterations,
109            reflections=all_reflections,
110            final_confidence=final_conf,
111            changes_log=changes_log
112        )
113
114    async def _generate(self, task: str) -> str:
115        """Generate initial output."""
116        response = self.client.messages.create(
117            model=self.model,
118            max_tokens=4096,
119            messages=[{
120                "role": "user",
121                "content": f"Complete this task thoroughly:\n\n{task}"
122            }]
123        )
124        return response.content[0].text
125
126    async def _reflect(
127        self,
128        task: str,
129        output: str,
130        requirements: list[str],
131        reflection_types: list[ReflectionType]
132    ) -> list[ReflectionResult]:
133        """Run specified reflection types."""
134
135        results = []
136
137        for rtype in reflection_types:
138            if rtype == ReflectionType.VERIFICATION:
139                result = await self._verify(task, output, requirements)
140            elif rtype == ReflectionType.REASONING:
141                result = await self._check_reasoning(task, output)
142            elif rtype == ReflectionType.CONSISTENCY:
143                result = await self._check_consistency(output)
144            elif rtype == ReflectionType.CONFIDENCE:
145                result = await self._assess_confidence(task, output)
146            else:
147                continue
148
149            results.append(result)
150
151        return results
152
153    async def _verify(
154        self,
155        task: str,
156        output: str,
157        requirements: list[str]
158    ) -> ReflectionResult:
159        """Verify output against requirements."""
160
161        prompt = f"""Verify this output against requirements.
162
163Task: {task}
164Requirements: {requirements or ['Complete the task fully']}
165
166Output:
167{output}
168
169Check each requirement. Return JSON:
170{{
171    "all_met": true/false,
172    "score": 0.0-1.0,
173    "issues": ["unmet requirements"],
174    "suggestions": ["how to fix"]
175}}"""
176
177        response = self.client.messages.create(
178            model=self.model,
179            max_tokens=1024,
180            messages=[{"role": "user", "content": prompt}]
181        )
182
183        result = json.loads(response.content[0].text)
184
185        return ReflectionResult(
186            reflection_type=ReflectionType.VERIFICATION,
187            passed=result["all_met"],
188            score=result["score"],
189            issues=result["issues"],
190            suggestions=result["suggestions"]
191        )
192
193    async def _check_reasoning(self, task: str, output: str) -> ReflectionResult:
194        """Check logical reasoning in output."""
195
196        prompt = f"""Check the reasoning in this output.
197
198Task: {task}
199Output: {output}
200
201Look for:
202- Logical fallacies
203- Unsupported claims
204- Contradictions
205- Missing steps
206
207Return JSON:
208{{
209    "logic_valid": true/false,
210    "score": 0.0-1.0,
211    "issues": ["specific logic problems"],
212    "suggestions": ["how to fix"]
213}}"""
214
215        response = self.client.messages.create(
216            model=self.model,
217            max_tokens=1024,
218            messages=[{"role": "user", "content": prompt}]
219        )
220
221        result = json.loads(response.content[0].text)
222
223        return ReflectionResult(
224            reflection_type=ReflectionType.REASONING,
225            passed=result["logic_valid"],
226            score=result["score"],
227            issues=result["issues"],
228            suggestions=result["suggestions"]
229        )
230
231    async def _check_consistency(self, output: str) -> ReflectionResult:
232        """Check internal consistency."""
233
234        prompt = f"""Check this output for internal consistency.
235
236Output:
237{output}
238
239Look for:
240- Contradictory statements
241- Inconsistent facts or numbers
242- Conflicting recommendations
243
244Return JSON:
245{{
246    "consistent": true/false,
247    "score": 0.0-1.0,
248    "issues": ["inconsistencies found"],
249    "suggestions": ["how to resolve"]
250}}"""
251
252        response = self.client.messages.create(
253            model=self.model,
254            max_tokens=1024,
255            messages=[{"role": "user", "content": prompt}]
256        )
257
258        result = json.loads(response.content[0].text)
259
260        return ReflectionResult(
261            reflection_type=ReflectionType.CONSISTENCY,
262            passed=result["consistent"],
263            score=result["score"],
264            issues=result["issues"],
265            suggestions=result["suggestions"]
266        )
267
268    async def _assess_confidence(self, task: str, output: str) -> ReflectionResult:
269        """Assess confidence in output."""
270
271        prompt = f"""Assess confidence in this output.
272
273Task: {task}
274Output: {output}
275
276Consider:
277- How certain are the claims?
278- What assumptions were made?
279- What could make this wrong?
280
281Return JSON:
282{{
283    "confident": true/false,
284    "score": 0.0-1.0,
285    "issues": ["uncertainty sources"],
286    "suggestions": ["how to increase confidence"]
287}}"""
288
289        response = self.client.messages.create(
290            model=self.model,
291            max_tokens=1024,
292            messages=[{"role": "user", "content": prompt}]
293        )
294
295        result = json.loads(response.content[0].text)
296
297        return ReflectionResult(
298            reflection_type=ReflectionType.CONFIDENCE,
299            passed=result["confident"],
300            score=result["score"],
301            issues=result["issues"],
302            suggestions=result["suggestions"]
303        )
304
305    async def _refine(
306        self,
307        task: str,
308        output: str,
309        issues: list[str],
310        suggestions: list[str]
311    ) -> str:
312        """Refine output based on reflection feedback."""
313
314        prompt = f"""Improve this output based on feedback.
315
316Original Task: {task}
317
318Current Output:
319{output}
320
321Issues Found:
322{chr(10).join(f"- {i}" for i in issues)}
323
324Suggestions:
325{chr(10).join(f"- {s}" for s in suggestions)}
326
327Provide an improved version addressing all issues."""
328
329        response = self.client.messages.create(
330            model=self.model,
331            max_tokens=4096,
332            messages=[{"role": "user", "content": prompt}]
333        )
334
335        return response.content[0].text
336
337    async def _assess_final_confidence(self, task: str, output: str) -> float:
338        """Get final confidence score."""
339
340        result = await self._assess_confidence(task, output)
341        return result.score

Usage Example

🐍python
1import asyncio
2
3async def main():
4    agent = ReflectiveAgent(
5        max_iterations=3,
6        min_confidence=0.85
7    )
8
9    result = await agent.execute(
10        task="Explain the trade-offs between SQL and NoSQL databases",
11        requirements=[
12            "Cover at least 3 major differences",
13            "Provide specific use case recommendations",
14            "Be balanced and objective"
15        ],
16        reflection_types=[
17            ReflectionType.VERIFICATION,
18            ReflectionType.REASONING,
19            ReflectionType.CONSISTENCY,
20            ReflectionType.CONFIDENCE
21        ]
22    )
23
24    print("Final Output:")
25    print(result.content)
26    print(f"\nIterations: {result.iterations}")
27    print(f"Final Confidence: {result.final_confidence:.2%}")
28    print(f"\nChanges Log:")
29    for change in result.changes_log:
30        print(f"  - {change}")
31
32asyncio.run(main())

Summary

Self-reflection and correction transform agents from single-pass generators into iterative refiners. We covered:

  • Why reflection matters: Catches errors, improves quality, and builds confidence in outputs
  • Reflection types: Verification, reasoning review, consistency checking, and confidence assessment
  • Critic patterns: Generator-critic loops and multi-critic ensembles for thorough review
  • Correction strategies: Targeted fixes, constrained regeneration, decomposition, and backtracking
  • Complete implementation: A ReflectiveAgent class integrating multiple reflection types

In the next section, we'll explore Chain-of-Thought prompting—a technique for making agent reasoning explicit and more reliable.