Chapter 2
20 min read
Section 12 of 175

LLM as the Reasoning Engine

Agent Architecture Fundamentals

Introduction

The large language model is the brain of your agent. It interprets context, makes decisions, generates tool calls, and synthesizes results. Understanding how to effectively use LLMs for agentic tasks is crucial for building capable agents.

The LLM Paradox: LLMs are remarkably capable yet fundamentally limited. They can reason about complex problems but can't execute code. They can plan strategies but can't verify results. Agents bridge this gap by giving LLMs tools to act.

The LLM's Role in Agents

In an agentic system, the LLM serves several key functions:

FunctionDescriptionExample
Decision MakingChoose which action to take nextDecide to read a file vs search the web
Parameter GenerationCreate inputs for tool callsGenerate the file path to read
SynthesisCombine information into responsesSummarize search results
ReflectionAnalyze results and adjust strategyRecognize an error and try a different approach
PlanningBreak down complex goalsCreate a step-by-step plan for a feature

The Decision Loop

🐍llm_decision_loop.py
1def llm_decide(context: str, tools: list[dict]) -> dict:
2    """Use LLM to decide the next action."""
3
4    system_prompt = """
5You are an AI agent working to accomplish goals.
6Analyze the context and decide your next action.
7
8You can:
91. Call a tool to gather information or take action
102. Finish if the goal is complete
113. Ask for clarification if needed
12
13Always explain your reasoning before acting.
14"""
15
16    response = client.messages.create(
17        model="claude-sonnet-4-20250514",
18        max_tokens=4096,
19        system=system_prompt,
20        messages=[{"role": "user", "content": context}],
21        tools=tools,
22    )
23
24    return parse_response(response)

Provider Comparison

Different LLM providers have different strengths for agentic tasks:

Claude (Anthropic)

  • Strengths: Long context (200K), careful reasoning, safety-focused
  • Best for: Complex multi-step tasks, code understanding, nuanced decisions
  • Tool calling: Native with structured outputs
🐍claude_agent.py
1import anthropic
2
3client = anthropic.Anthropic()
4
5def call_claude_with_tools(prompt: str, tools: list[dict]) -> dict:
6    response = client.messages.create(
7        model="claude-sonnet-4-20250514",
8        max_tokens=4096,
9        messages=[{"role": "user", "content": prompt}],
10        tools=tools,
11    )
12
13    result = {"text": "", "tool_calls": []}
14
15    for block in response.content:
16        if block.type == "text":
17            result["text"] = block.text
18        elif block.type == "tool_use":
19            result["tool_calls"].append({
20                "id": block.id,
21                "name": block.name,
22                "input": block.input,
23            })
24
25    return result

GPT-4 / o3 (OpenAI)

  • Strengths: Strong multimodal, fast responses, extensive function calling
  • o3 special: Extended thinking for complex reasoning tasks
  • Best for: Rapid iteration, vision tasks, parallel function calls
🐍openai_agent.py
1from openai import OpenAI
2
3client = OpenAI()
4
5def call_openai_with_tools(prompt: str, tools: list[dict]) -> dict:
6    response = client.chat.completions.create(
7        model="gpt-4o",
8        messages=[{"role": "user", "content": prompt}],
9        tools=[{"type": "function", "function": t} for t in tools],
10    )
11
12    message = response.choices[0].message
13
14    result = {"text": message.content or "", "tool_calls": []}
15
16    if message.tool_calls:
17        for tc in message.tool_calls:
18            result["tool_calls"].append({
19                "id": tc.id,
20                "name": tc.function.name,
21                "input": json.loads(tc.function.arguments),
22            })
23
24    return result

Gemini (Google)

  • Strengths: Massive context (2M tokens), native multimodal, Google integration
  • Best for: Document processing, research tasks, long-context applications
ProviderContext WindowTool CallingBest For
Claude Opus 4200KNativeComplex reasoning
Claude Sonnet 4200KNativeBalanced performance
GPT-4o128KNativeMultimodal, speed
o3200KNativeExtended reasoning
Gemini 1.5 Pro2MNativeLong documents

Agent Prompt Engineering

Agent prompts differ from chatbot prompts. They must guide decision-making, not just generate responses:

System Prompt Structure

🐍agent_system_prompt.py
1AGENT_SYSTEM_PROMPT = """
2You are an AI agent that accomplishes goals by taking actions.
3
4## Your Capabilities
5You have access to these tools:
6{tool_descriptions}
7
8## Decision Framework
9When deciding your next action, consider:
101. What information do I need?
112. What actions can move me toward the goal?
123. Have I verified my assumptions?
134. Is the goal complete?
14
15## Guidelines
16- Always explain your reasoning before acting
17- Use tools to verify information, don't guess
18- If stuck, try a different approach
19- Ask for clarification when requirements are ambiguous
20- Finish when the goal is definitively complete
21
22## Error Handling
23- If a tool fails, analyze the error and try an alternative
24- If stuck after 3 attempts, explain the blocker and ask for help
25- Never make destructive changes without explicit confirmation
26"""

Context Prompt Structure

🐍context_prompt.py
1def build_context_prompt(state: AgentState) -> str:
2    return f"""
3## Goal
4{state.goal}
5
6## Progress
7Steps completed: {len(state.completed_steps)}/{len(state.plan.steps)}
8Current step: {state.current_step.description if state.current_step else "None"}
9
10## Recent Actions
11{format_recent_actions(state.recent_actions[-5:])}
12
13## Relevant Context
14{format_memories(state.memories)}
15
16## Current Situation
17{state.environment_summary}
18
19## Your Task
20Based on the above context, decide your next action.
21First explain your reasoning, then specify the tool to use.
22"""

Prompt Engineering for Agents

Agent prompts should be more structured than conversational prompts. Use clear sections, explicit guidelines, and consistent formatting to help the LLM make better decisions.

Native Tool Calling

Modern LLMs support native tool/function calling, eliminating the need for fragile prompt-based parsing:

🐍tool_schema.py
1# Tool schema for LLM function calling
2read_file_tool = {
3    "name": "read_file",
4    "description": "Read the contents of a file at the given path",
5    "input_schema": {
6        "type": "object",
7        "properties": {
8            "file_path": {
9                "type": "string",
10                "description": "The absolute or relative path to the file",
11            },
12        },
13        "required": ["file_path"],
14    },
15}
16
17search_web_tool = {
18    "name": "search_web",
19    "description": "Search the web for information",
20    "input_schema": {
21        "type": "object",
22        "properties": {
23            "query": {
24                "type": "string",
25                "description": "The search query",
26            },
27            "num_results": {
28                "type": "integer",
29                "description": "Number of results to return (default: 5)",
30                "default": 5,
31            },
32        },
33        "required": ["query"],
34    },
35}

Processing Tool Calls

🐍process_tool_calls.py
1def process_tool_response(response: dict, tools: ToolRegistry) -> list[dict]:
2    """Process tool calls from LLM response."""
3    results = []
4
5    for tool_call in response.get("tool_calls", []):
6        tool_name = tool_call["name"]
7        tool_input = tool_call["input"]
8        tool_id = tool_call["id"]
9
10        # Execute the tool
11        tool_result = tools.execute(tool_name, **tool_input)
12
13        results.append({
14            "tool_use_id": tool_id,
15            "type": "tool_result",
16            "content": tool_result.output if tool_result.success else f"Error: {tool_result.error}",
17        })
18
19    return results
20
21
22def continue_with_results(
23    messages: list[dict],
24    tool_results: list[dict],
25) -> dict:
26    """Continue conversation with tool results."""
27
28    # Add tool results to messages
29    messages.append({"role": "user", "content": tool_results})
30
31    response = client.messages.create(
32        model="claude-sonnet-4-20250514",
33        messages=messages,
34        tools=tools,
35    )
36
37    return parse_response(response)

Context Window Management

Agents quickly accumulate context. Managing what fits in the window is critical:

Strategies for Context Management

  • Summarization: Compress old context into summaries
  • Truncation: Remove least relevant older entries
  • Retrieval: Only include relevant memories via RAG
  • Sliding window: Keep only the most recent N interactions
🐍context_management.py
1class ContextManager:
2    """Manages context window for agent."""
3
4    def __init__(
5        self,
6        max_tokens: int = 100000,
7        summary_threshold: int = 50000,
8    ):
9        self.max_tokens = max_tokens
10        self.summary_threshold = summary_threshold
11
12    def prepare_context(
13        self,
14        messages: list[dict],
15        memories: list[str],
16        current_task: str,
17    ) -> list[dict]:
18        """Prepare context that fits within token limits."""
19
20        # Estimate current token count
21        current_tokens = self.estimate_tokens(messages)
22
23        # If under threshold, use all context
24        if current_tokens < self.summary_threshold:
25            return self._build_full_context(messages, memories, current_task)
26
27        # Otherwise, summarize older messages
28        return self._build_summarized_context(
29            messages, memories, current_task
30        )
31
32    def _build_summarized_context(
33        self,
34        messages: list[dict],
35        memories: list[str],
36        current_task: str,
37    ) -> list[dict]:
38        """Build context with summarized history."""
39
40        # Keep recent messages
41        recent = messages[-10:]
42
43        # Summarize older messages
44        older = messages[:-10]
45        summary = self._summarize_messages(older)
46
47        # Build new context
48        return [
49            {"role": "user", "content": f"## Previous Context\n{summary}"},
50            *recent,
51            {"role": "user", "content": f"## Current Task\n{current_task}"},
52        ]
53
54    def _summarize_messages(self, messages: list[dict]) -> str:
55        """Create a summary of messages."""
56        # Use LLM to summarize
57        prompt = f"Summarize these agent actions concisely:\n{messages}"
58        response = summarizer_llm.generate(prompt)
59        return response.text

Token Costs Add Up

Long agent sessions can consume millions of tokens. Implement summarization early and monitor your token usage to avoid surprise bills.

Summary

The LLM as reasoning engine:

  1. Role: Decision-making, parameter generation, synthesis, reflection
  2. Providers: Claude, GPT-4/o3, Gemini - each with different strengths
  3. Prompts: Structure for decisions, not just responses
  4. Tool Calling: Native structured outputs for reliable execution
  5. Context: Manage carefully to stay within limits
Next Up: With the reasoning engine understood, let's explore how to build the tools that give your agent hands to act on the world.