Introduction
Human experts don't just produce work—they review it, question their assumptions, and refine their outputs. This metacognitive ability—self-reflection—is crucial for quality work. For AI agents, building in similar reflection capabilities can dramatically improve output quality and reliability.
In this section, we'll explore how agents can evaluate their own reasoning, detect errors in their outputs, and iteratively improve their work through self-correction loops.
Core Insight: Reflection transforms agents from single-shot generators into iterative refiners. The ability to critique and improve one's own work is what separates adequate from excellent outputs.
Why Reflection Matters
Without reflection, agents produce outputs in a single pass. This works for simple tasks but fails for complex ones where the first attempt is rarely optimal.
Benefits of Self-Reflection
| Benefit | Description | Example |
|---|---|---|
| Error Detection | Catch mistakes before user sees them | Finding logical flaws in code |
| Quality Improvement | Iteratively refine outputs | Improving clarity of explanations |
| Consistency Checking | Ensure outputs align with requirements | Verifying all specs are addressed |
| Confidence Calibration | Identify uncertain or weak areas | Flagging assumptions that need validation |
| Learning Opportunity | Extract insights for future tasks | Noting patterns that worked well |
The Reflection Gap
Research shows that LLMs can often identify errors in outputs they themselves generated—if explicitly asked to review. This "reflection gap" means the model has latent evaluation capability that goes unused without explicit prompting.
1# Without reflection - single pass
2def generate_code(task: str) -> str:
3 response = llm.generate(f"Write code for: {task}")
4 return response # May contain bugs
5
6# With reflection - self-review
7def generate_code_with_reflection(task: str) -> str:
8 # Generate initial code
9 code = llm.generate(f"Write code for: {task}")
10
11 # Reflect on the code
12 review = llm.generate(f"""
13 Review this code for bugs, edge cases, and improvements:
14
15 {code}
16
17 List any issues found.
18 """)
19
20 # If issues found, regenerate
21 if "no issues" not in review.lower():
22 code = llm.generate(f"""
23 Fix the issues in this code:
24
25 Original code:
26 {code}
27
28 Issues found:
29 {review}
30
31 Provide corrected code.
32 """)
33
34 return codeTypes of Self-Reflection
Different types of reflection serve different purposes. Effective agents use multiple reflection strategies:
1. Output Verification
Check if the output meets the stated requirements:
1async def verify_output(
2 task: str,
3 output: str,
4 requirements: list[str]
5) -> dict:
6 """Verify output against requirements."""
7
8 prompt = f"""Verify if this output meets all requirements.
9
10Task: {task}
11
12Output:
13{output}
14
15Requirements:
16{chr(10).join(f"- {r}" for r in requirements)}
17
18For each requirement, indicate:
19- MET: Requirement is fully satisfied
20- PARTIAL: Partially satisfied, needs improvement
21- NOT_MET: Not addressed or wrong
22
23Return JSON:
24{{
25 "verification": [
26 {{"requirement": "...", "status": "MET|PARTIAL|NOT_MET", "reason": "..."}}
27 ],
28 "overall_pass": true/false,
29 "critical_issues": ["..."]
30}}"""
31
32 response = await llm.generate(prompt)
33 return json.loads(response)2. Reasoning Trace Review
Examine the reasoning process itself, not just the output:
1async def review_reasoning(
2 question: str,
3 reasoning_trace: str,
4 conclusion: str
5) -> dict:
6 """Review the quality of reasoning."""
7
8 prompt = f"""Review this reasoning process for logical validity.
9
10Question: {question}
11
12Reasoning:
13{reasoning_trace}
14
15Conclusion: {conclusion}
16
17Check for:
181. Logical fallacies (non sequiturs, false dichotomies, etc.)
192. Unsupported assumptions
203. Missing steps in the argument
214. Contradictions
225. Valid but weak arguments
23
24Return JSON:
25{{
26 "logic_valid": true/false,
27 "issues": [
28 {{"type": "fallacy|assumption|gap|contradiction|weak",
29 "description": "...",
30 "location": "where in reasoning"}}
31 ],
32 "confidence_in_conclusion": 0.0-1.0,
33 "suggested_improvements": ["..."]
34}}"""
35
36 response = await llm.generate(prompt)
37 return json.loads(response)3. Consistency Checking
Ensure different parts of the output are consistent with each other:
1async def check_consistency(
2 outputs: dict[str, str]
3) -> dict:
4 """Check consistency across multiple outputs."""
5
6 prompt = f"""Check these outputs for internal consistency.
7
8Outputs:
9{json.dumps(outputs, indent=2)}
10
11Look for:
121. Contradictory statements
132. Conflicting numbers or facts
143. Inconsistent terminology
154. Timeline inconsistencies
16
17Return JSON:
18{{
19 "consistent": true/false,
20 "conflicts": [
21 {{"outputs": ["output1", "output2"],
22 "conflict": "description of conflict",
23 "suggested_resolution": "..."}}
24 ]
25}}"""
26
27 response = await llm.generate(prompt)
28 return json.loads(response)4. Confidence Assessment
Evaluate how confident the agent should be in its output:
1async def assess_confidence(
2 task: str,
3 output: str,
4 context: str = ""
5) -> dict:
6 """Assess confidence in the output."""
7
8 prompt = f"""Assess confidence in this output.
9
10Task: {task}
11Context: {context}
12Output: {output}
13
14Consider:
151. How much of this is factual vs. inference?
162. What assumptions were made?
173. What information would increase confidence?
184. What could make this wrong?
19
20Return JSON:
21{{
22 "overall_confidence": 0.0-1.0,
23 "confidence_breakdown": {{
24 "factual_accuracy": 0.0-1.0,
25 "logical_validity": 0.0-1.0,
26 "completeness": 0.0-1.0
27 }},
28 "assumptions": ["..."],
29 "uncertainty_sources": ["..."],
30 "would_help": ["what additional info would help"]
31}}"""
32
33 response = await llm.generate(prompt)
34 return json.loads(response)Critic and Verifier Patterns
A powerful pattern is separating the generator (produces output) from the critic (evaluates output). This creates productive tension that improves quality.
Generator-Critic Loop
1from dataclasses import dataclass
2from typing import Optional
3
4@dataclass
5class CriticFeedback:
6 """Feedback from the critic."""
7 approved: bool
8 issues: list[str]
9 suggestions: list[str]
10 severity: str # "minor", "moderate", "major"
11
12@dataclass
13class GeneratorOutput:
14 """Output from the generator."""
15 content: str
16 iteration: int
17 changes_made: list[str]
18
19class GeneratorCriticLoop:
20 """
21 Iterative refinement through generator-critic interaction.
22 """
23
24 def __init__(
25 self,
26 max_iterations: int = 5,
27 approval_threshold: float = 0.8
28 ):
29 self.max_iterations = max_iterations
30 self.approval_threshold = approval_threshold
31 self.client = Anthropic()
32
33 async def generate(self, task: str) -> str:
34 """Initial generation."""
35 response = self.client.messages.create(
36 model="claude-sonnet-4-20250514",
37 max_tokens=4096,
38 messages=[{"role": "user", "content": f"Complete this task:\n{task}"}]
39 )
40 return response.content[0].text
41
42 async def critique(
43 self,
44 task: str,
45 output: str,
46 history: list[CriticFeedback] = None
47 ) -> CriticFeedback:
48 """Critique the output."""
49
50 history_context = ""
51 if history:
52 history_context = f"""
53Previous feedback that was already addressed:
54{chr(10).join(f"- {fb.issues}" for fb in history)}
55
56Do not repeat these issues if they were fixed.
57"""
58
59 prompt = f"""Critically evaluate this output for the given task.
60
61Task: {task}
62{history_context}
63Output to evaluate:
64{output}
65
66Be thorough but fair. Look for:
671. Correctness - Is it factually/logically right?
682. Completeness - Does it fully address the task?
693. Quality - Is it well-structured and clear?
704. Edge cases - Are edge cases handled?
71
72Return JSON:
73{{
74 "approved": true/false,
75 "quality_score": 0.0-1.0,
76 "issues": ["list of specific issues"],
77 "suggestions": ["concrete improvement suggestions"],
78 "severity": "minor|moderate|major"
79}}"""
80
81 response = self.client.messages.create(
82 model="claude-sonnet-4-20250514",
83 max_tokens=2048,
84 messages=[{"role": "user", "content": prompt}]
85 )
86
87 result = json.loads(response.content[0].text)
88 return CriticFeedback(
89 approved=result["approved"],
90 issues=result["issues"],
91 suggestions=result["suggestions"],
92 severity=result["severity"]
93 )
94
95 async def refine(
96 self,
97 task: str,
98 output: str,
99 feedback: CriticFeedback
100 ) -> str:
101 """Refine output based on feedback."""
102
103 prompt = f"""Improve this output based on feedback.
104
105Original Task: {task}
106
107Current Output:
108{output}
109
110Issues to Fix:
111{chr(10).join(f"- {issue}" for issue in feedback.issues)}
112
113Suggestions:
114{chr(10).join(f"- {sug}" for sug in feedback.suggestions)}
115
116Provide an improved version that addresses all issues."""
117
118 response = self.client.messages.create(
119 model="claude-sonnet-4-20250514",
120 max_tokens=4096,
121 messages=[{"role": "user", "content": prompt}]
122 )
123
124 return response.content[0].text
125
126 async def run(self, task: str) -> tuple[str, list[CriticFeedback]]:
127 """Run the full generator-critic loop."""
128
129 output = await self.generate(task)
130 feedback_history = []
131
132 for iteration in range(self.max_iterations):
133 # Get critique
134 feedback = await self.critique(task, output, feedback_history)
135 feedback_history.append(feedback)
136
137 # Check if approved
138 if feedback.approved:
139 print(f"Approved after {iteration + 1} iterations")
140 break
141
142 # Check if issues are minor
143 if feedback.severity == "minor":
144 print(f"Minor issues remaining, stopping at iteration {iteration + 1}")
145 break
146
147 # Refine
148 output = await self.refine(task, output, feedback)
149
150 return output, feedback_historyMulti-Critic Ensemble
Use multiple critics with different focuses for more thorough review:
1class CriticEnsemble:
2 """Multiple specialized critics for comprehensive review."""
3
4 def __init__(self):
5 self.critics = {
6 "correctness": self._correctness_critic,
7 "clarity": self._clarity_critic,
8 "completeness": self._completeness_critic,
9 "style": self._style_critic,
10 }
11
12 async def _correctness_critic(self, task: str, output: str) -> dict:
13 """Focus on factual and logical correctness."""
14 prompt = f"""As a correctness reviewer, check this output.
15
16Task: {task}
17Output: {output}
18
19Focus ONLY on:
20- Factual accuracy
21- Logical validity
22- Technical correctness
23
24Ignore style, formatting, or clarity issues.
25
26Return JSON: {{"issues": [...], "severity": "none|minor|major"}}"""
27 # ... call LLM
28
29 async def _clarity_critic(self, task: str, output: str) -> dict:
30 """Focus on clarity and understandability."""
31 prompt = f"""As a clarity reviewer, check this output.
32
33Task: {task}
34Output: {output}
35
36Focus ONLY on:
37- Clear explanations
38- Logical flow
39- Appropriate detail level
40
41Ignore correctness (assume it's correct).
42
43Return JSON: {{"issues": [...], "severity": "none|minor|major"}}"""
44 # ... call LLM
45
46 async def _completeness_critic(self, task: str, output: str) -> dict:
47 """Focus on completeness."""
48 # Similar pattern...
49
50 async def _style_critic(self, task: str, output: str) -> dict:
51 """Focus on style and formatting."""
52 # Similar pattern...
53
54 async def full_review(self, task: str, output: str) -> dict:
55 """Run all critics and aggregate feedback."""
56
57 reviews = {}
58 for name, critic in self.critics.items():
59 reviews[name] = await critic(task, output)
60
61 # Aggregate
62 all_issues = []
63 max_severity = "none"
64 severity_order = {"none": 0, "minor": 1, "major": 2}
65
66 for name, review in reviews.items():
67 for issue in review["issues"]:
68 all_issues.append({"critic": name, "issue": issue})
69
70 if severity_order[review["severity"]] > severity_order[max_severity]:
71 max_severity = review["severity"]
72
73 return {
74 "reviews": reviews,
75 "all_issues": all_issues,
76 "overall_severity": max_severity,
77 "approved": max_severity == "none"
78 }Self-Correction Strategies
When reflection reveals problems, agents need strategies to correct them. Here are key approaches:
1. Targeted Fixes
Address specific identified issues without regenerating everything:
1async def targeted_fix(
2 output: str,
3 issues: list[dict]
4) -> str:
5 """Apply targeted fixes to specific issues."""
6
7 fixes_prompt = """Fix only the specific issues listed below.
8Preserve everything else exactly as is.
9
10Current output:
11{output}
12
13Issues to fix:
14{issues}
15
16Return the corrected output with minimal changes."""
17
18 formatted_issues = "\n".join(
19 f"- {i['location']}: {i['issue']}"
20 for i in issues
21 )
22
23 response = await llm.generate(
24 fixes_prompt.format(output=output, issues=formatted_issues)
25 )
26
27 return response2. Regeneration with Constraints
When targeted fixes aren't enough, regenerate with explicit constraints:
1async def constrained_regeneration(
2 task: str,
3 failed_output: str,
4 constraints: list[str]
5) -> str:
6 """Regenerate with explicit constraints from failed attempt."""
7
8 prompt = f"""Complete this task while following these constraints.
9
10Task: {task}
11
12CONSTRAINTS (must follow):
13{chr(10).join(f"- {c}" for c in constraints)}
14
15Previous attempt had these problems (avoid them):
16{failed_output[:500]}...
17
18Generate a new solution that satisfies all constraints."""
19
20 response = await llm.generate(prompt)
21 return response3. Decompose and Fix
Break the problem into parts, fix each independently:
1async def decompose_and_fix(
2 output: str,
3 issues: list[dict]
4) -> str:
5 """Decompose output into sections, fix each."""
6
7 # Group issues by section
8 sections = segment_output(output) # Returns {section_id: text}
9 issues_by_section = group_issues_by_section(issues)
10
11 fixed_sections = {}
12
13 for section_id, section_text in sections.items():
14 section_issues = issues_by_section.get(section_id, [])
15
16 if not section_issues:
17 # No issues, keep as is
18 fixed_sections[section_id] = section_text
19 else:
20 # Fix this section
21 fixed = await targeted_fix(section_text, section_issues)
22 fixed_sections[section_id] = fixed
23
24 # Reassemble
25 return reassemble_sections(fixed_sections)4. Backtracking
When current approach is fundamentally flawed, backtrack to an earlier decision point:
1class BacktrackingCorrector:
2 """Maintain history for backtracking when needed."""
3
4 def __init__(self):
5 self.history: list[dict] = []
6
7 def checkpoint(self, state: dict, decision: str) -> None:
8 """Save a checkpoint."""
9 self.history.append({
10 "state": state.copy(),
11 "decision": decision,
12 "timestamp": time.time()
13 })
14
15 def backtrack(self, steps: int = 1) -> dict:
16 """Backtrack to a previous state."""
17 if steps > len(self.history):
18 raise ValueError("Cannot backtrack that far")
19
20 # Remove recent history
21 for _ in range(steps):
22 self.history.pop()
23
24 # Return the last checkpoint state
25 return self.history[-1]["state"] if self.history else {}
26
27 def find_alternative(self, failed_decision: str) -> str:
28 """Find alternative to a failed decision."""
29 # Use LLM to suggest alternative approach
30 prompt = f"""The decision "{failed_decision}" led to problems.
31
32History of decisions:
33{[h["decision"] for h in self.history]}
34
35Suggest an alternative approach that avoids this problem."""
36
37 return llm.generate(prompt)Building Reflective Agents
Let's build a complete reflective agent that integrates the patterns we've discussed:
1from anthropic import Anthropic
2from dataclasses import dataclass, field
3from typing import Any, Optional, Callable
4from enum import Enum
5import json
6
7class ReflectionType(Enum):
8 VERIFICATION = "verification"
9 REASONING = "reasoning"
10 CONSISTENCY = "consistency"
11 CONFIDENCE = "confidence"
12
13@dataclass
14class ReflectionResult:
15 """Result of a reflection pass."""
16 reflection_type: ReflectionType
17 passed: bool
18 score: float
19 issues: list[str]
20 suggestions: list[str]
21 metadata: dict = field(default_factory=dict)
22
23@dataclass
24class AgentOutput:
25 """Agent output with reflection metadata."""
26 content: str
27 iterations: int
28 reflections: list[ReflectionResult]
29 final_confidence: float
30 changes_log: list[str]
31
32class ReflectiveAgent:
33 """
34 Agent with built-in self-reflection and correction.
35
36 Uses multiple reflection types and iterative refinement
37 to produce high-quality outputs.
38 """
39
40 def __init__(
41 self,
42 model: str = "claude-sonnet-4-20250514",
43 max_iterations: int = 3,
44 min_confidence: float = 0.8
45 ):
46 self.client = Anthropic()
47 self.model = model
48 self.max_iterations = max_iterations
49 self.min_confidence = min_confidence
50
51 async def execute(
52 self,
53 task: str,
54 requirements: list[str] = None,
55 reflection_types: list[ReflectionType] = None
56 ) -> AgentOutput:
57 """Execute task with reflection."""
58
59 if reflection_types is None:
60 reflection_types = [
61 ReflectionType.VERIFICATION,
62 ReflectionType.CONFIDENCE
63 ]
64
65 # Initial generation
66 output = await self._generate(task)
67 iterations = 1
68 all_reflections = []
69 changes_log = ["Initial generation"]
70
71 while iterations < self.max_iterations:
72 # Run reflections
73 reflections = await self._reflect(
74 task,
75 output,
76 requirements or [],
77 reflection_types
78 )
79 all_reflections.extend(reflections)
80
81 # Check if all reflections pass
82 all_pass = all(r.passed for r in reflections)
83 avg_score = sum(r.score for r in reflections) / len(reflections)
84
85 if all_pass and avg_score >= self.min_confidence:
86 break
87
88 # Collect issues and refine
89 all_issues = []
90 all_suggestions = []
91 for r in reflections:
92 all_issues.extend(r.issues)
93 all_suggestions.extend(r.suggestions)
94
95 if not all_issues:
96 break # No specific issues to fix
97
98 # Refine
99 output = await self._refine(task, output, all_issues, all_suggestions)
100 iterations += 1
101 changes_log.append(f"Iteration {iterations}: Fixed {len(all_issues)} issues")
102
103 # Final confidence assessment
104 final_conf = await self._assess_final_confidence(task, output)
105
106 return AgentOutput(
107 content=output,
108 iterations=iterations,
109 reflections=all_reflections,
110 final_confidence=final_conf,
111 changes_log=changes_log
112 )
113
114 async def _generate(self, task: str) -> str:
115 """Generate initial output."""
116 response = self.client.messages.create(
117 model=self.model,
118 max_tokens=4096,
119 messages=[{
120 "role": "user",
121 "content": f"Complete this task thoroughly:\n\n{task}"
122 }]
123 )
124 return response.content[0].text
125
126 async def _reflect(
127 self,
128 task: str,
129 output: str,
130 requirements: list[str],
131 reflection_types: list[ReflectionType]
132 ) -> list[ReflectionResult]:
133 """Run specified reflection types."""
134
135 results = []
136
137 for rtype in reflection_types:
138 if rtype == ReflectionType.VERIFICATION:
139 result = await self._verify(task, output, requirements)
140 elif rtype == ReflectionType.REASONING:
141 result = await self._check_reasoning(task, output)
142 elif rtype == ReflectionType.CONSISTENCY:
143 result = await self._check_consistency(output)
144 elif rtype == ReflectionType.CONFIDENCE:
145 result = await self._assess_confidence(task, output)
146 else:
147 continue
148
149 results.append(result)
150
151 return results
152
153 async def _verify(
154 self,
155 task: str,
156 output: str,
157 requirements: list[str]
158 ) -> ReflectionResult:
159 """Verify output against requirements."""
160
161 prompt = f"""Verify this output against requirements.
162
163Task: {task}
164Requirements: {requirements or ['Complete the task fully']}
165
166Output:
167{output}
168
169Check each requirement. Return JSON:
170{{
171 "all_met": true/false,
172 "score": 0.0-1.0,
173 "issues": ["unmet requirements"],
174 "suggestions": ["how to fix"]
175}}"""
176
177 response = self.client.messages.create(
178 model=self.model,
179 max_tokens=1024,
180 messages=[{"role": "user", "content": prompt}]
181 )
182
183 result = json.loads(response.content[0].text)
184
185 return ReflectionResult(
186 reflection_type=ReflectionType.VERIFICATION,
187 passed=result["all_met"],
188 score=result["score"],
189 issues=result["issues"],
190 suggestions=result["suggestions"]
191 )
192
193 async def _check_reasoning(self, task: str, output: str) -> ReflectionResult:
194 """Check logical reasoning in output."""
195
196 prompt = f"""Check the reasoning in this output.
197
198Task: {task}
199Output: {output}
200
201Look for:
202- Logical fallacies
203- Unsupported claims
204- Contradictions
205- Missing steps
206
207Return JSON:
208{{
209 "logic_valid": true/false,
210 "score": 0.0-1.0,
211 "issues": ["specific logic problems"],
212 "suggestions": ["how to fix"]
213}}"""
214
215 response = self.client.messages.create(
216 model=self.model,
217 max_tokens=1024,
218 messages=[{"role": "user", "content": prompt}]
219 )
220
221 result = json.loads(response.content[0].text)
222
223 return ReflectionResult(
224 reflection_type=ReflectionType.REASONING,
225 passed=result["logic_valid"],
226 score=result["score"],
227 issues=result["issues"],
228 suggestions=result["suggestions"]
229 )
230
231 async def _check_consistency(self, output: str) -> ReflectionResult:
232 """Check internal consistency."""
233
234 prompt = f"""Check this output for internal consistency.
235
236Output:
237{output}
238
239Look for:
240- Contradictory statements
241- Inconsistent facts or numbers
242- Conflicting recommendations
243
244Return JSON:
245{{
246 "consistent": true/false,
247 "score": 0.0-1.0,
248 "issues": ["inconsistencies found"],
249 "suggestions": ["how to resolve"]
250}}"""
251
252 response = self.client.messages.create(
253 model=self.model,
254 max_tokens=1024,
255 messages=[{"role": "user", "content": prompt}]
256 )
257
258 result = json.loads(response.content[0].text)
259
260 return ReflectionResult(
261 reflection_type=ReflectionType.CONSISTENCY,
262 passed=result["consistent"],
263 score=result["score"],
264 issues=result["issues"],
265 suggestions=result["suggestions"]
266 )
267
268 async def _assess_confidence(self, task: str, output: str) -> ReflectionResult:
269 """Assess confidence in output."""
270
271 prompt = f"""Assess confidence in this output.
272
273Task: {task}
274Output: {output}
275
276Consider:
277- How certain are the claims?
278- What assumptions were made?
279- What could make this wrong?
280
281Return JSON:
282{{
283 "confident": true/false,
284 "score": 0.0-1.0,
285 "issues": ["uncertainty sources"],
286 "suggestions": ["how to increase confidence"]
287}}"""
288
289 response = self.client.messages.create(
290 model=self.model,
291 max_tokens=1024,
292 messages=[{"role": "user", "content": prompt}]
293 )
294
295 result = json.loads(response.content[0].text)
296
297 return ReflectionResult(
298 reflection_type=ReflectionType.CONFIDENCE,
299 passed=result["confident"],
300 score=result["score"],
301 issues=result["issues"],
302 suggestions=result["suggestions"]
303 )
304
305 async def _refine(
306 self,
307 task: str,
308 output: str,
309 issues: list[str],
310 suggestions: list[str]
311 ) -> str:
312 """Refine output based on reflection feedback."""
313
314 prompt = f"""Improve this output based on feedback.
315
316Original Task: {task}
317
318Current Output:
319{output}
320
321Issues Found:
322{chr(10).join(f"- {i}" for i in issues)}
323
324Suggestions:
325{chr(10).join(f"- {s}" for s in suggestions)}
326
327Provide an improved version addressing all issues."""
328
329 response = self.client.messages.create(
330 model=self.model,
331 max_tokens=4096,
332 messages=[{"role": "user", "content": prompt}]
333 )
334
335 return response.content[0].text
336
337 async def _assess_final_confidence(self, task: str, output: str) -> float:
338 """Get final confidence score."""
339
340 result = await self._assess_confidence(task, output)
341 return result.scoreUsage Example
1import asyncio
2
3async def main():
4 agent = ReflectiveAgent(
5 max_iterations=3,
6 min_confidence=0.85
7 )
8
9 result = await agent.execute(
10 task="Explain the trade-offs between SQL and NoSQL databases",
11 requirements=[
12 "Cover at least 3 major differences",
13 "Provide specific use case recommendations",
14 "Be balanced and objective"
15 ],
16 reflection_types=[
17 ReflectionType.VERIFICATION,
18 ReflectionType.REASONING,
19 ReflectionType.CONSISTENCY,
20 ReflectionType.CONFIDENCE
21 ]
22 )
23
24 print("Final Output:")
25 print(result.content)
26 print(f"\nIterations: {result.iterations}")
27 print(f"Final Confidence: {result.final_confidence:.2%}")
28 print(f"\nChanges Log:")
29 for change in result.changes_log:
30 print(f" - {change}")
31
32asyncio.run(main())Summary
Self-reflection and correction transform agents from single-pass generators into iterative refiners. We covered:
- Why reflection matters: Catches errors, improves quality, and builds confidence in outputs
- Reflection types: Verification, reasoning review, consistency checking, and confidence assessment
- Critic patterns: Generator-critic loops and multi-critic ensembles for thorough review
- Correction strategies: Targeted fixes, constrained regeneration, decomposition, and backtracking
- Complete implementation: A ReflectiveAgent class integrating multiple reflection types
In the next section, we'll explore Chain-of-Thought prompting—a technique for making agent reasoning explicit and more reliable.