Boo-AI — Master Artificial Intelligence by Building from Scratch

Introduction

The large language model is the brain of your agent. It interprets context, makes decisions, generates tool calls, and synthesizes results. Understanding how to effectively use LLMs for agentic tasks is crucial for building capable agents.

The LLM Paradox: LLMs are remarkably capable yet fundamentally limited. They can reason about complex problems but can't execute code. They can plan strategies but can't verify results. Agents bridge this gap by giving LLMs tools to act.

The LLM's Role in Agents

In an agentic system, the LLM serves several key functions:

Function	Description	Example
Decision Making	Choose which action to take next	Decide to read a file vs search the web
Parameter Generation	Create inputs for tool calls	Generate the file path to read
Synthesis	Combine information into responses	Summarize search results
Reflection	Analyze results and adjust strategy	Recognize an error and try a different approach
Planning	Break down complex goals	Create a step-by-step plan for a feature

The Decision Loop

🐍llm_decision_loop.py

1def llm_decide(context: str, tools: list[dict]) -> dict:
2    """Use LLM to decide the next action."""
3
4    system_prompt = """
5You are an AI agent working to accomplish goals.
6Analyze the context and decide your next action.
7
8You can:
91. Call a tool to gather information or take action
102. Finish if the goal is complete
113. Ask for clarification if needed
12
13Always explain your reasoning before acting.
14"""
15
16    response = client.messages.create(
17        model="claude-sonnet-4-20250514",
18        max_tokens=4096,
19        system=system_prompt,
20        messages=[{"role": "user", "content": context}],
21        tools=tools,
22    )
23
24    return parse_response(response)

Provider Comparison

Different LLM providers have different strengths for agentic tasks:

Claude (Anthropic)

Strengths: Long context (200K), careful reasoning, safety-focused
Best for: Complex multi-step tasks, code understanding, nuanced decisions
Tool calling: Native with structured outputs

🐍claude_agent.py

1import anthropic
2
3client = anthropic.Anthropic()
4
5def call_claude_with_tools(prompt: str, tools: list[dict]) -> dict:
6    response = client.messages.create(
7        model="claude-sonnet-4-20250514",
8        max_tokens=4096,
9        messages=[{"role": "user", "content": prompt}],
10        tools=tools,
11    )
12
13    result = {"text": "", "tool_calls": []}
14
15    for block in response.content:
16        if block.type == "text":
17            result["text"] = block.text
18        elif block.type == "tool_use":
19            result["tool_calls"].append({
20                "id": block.id,
21                "name": block.name,
22                "input": block.input,
23            })
24
25    return result

GPT-4 / o3 (OpenAI)

Strengths: Strong multimodal, fast responses, extensive function calling
o3 special: Extended thinking for complex reasoning tasks
Best for: Rapid iteration, vision tasks, parallel function calls

🐍openai_agent.py

1from openai import OpenAI
2
3client = OpenAI()
4
5def call_openai_with_tools(prompt: str, tools: list[dict]) -> dict:
6    response = client.chat.completions.create(
7        model="gpt-4o",
8        messages=[{"role": "user", "content": prompt}],
9        tools=[{"type": "function", "function": t} for t in tools],
10    )
11
12    message = response.choices[0].message
13
14    result = {"text": message.content or "", "tool_calls": []}
15
16    if message.tool_calls:
17        for tc in message.tool_calls:
18            result["tool_calls"].append({
19                "id": tc.id,
20                "name": tc.function.name,
21                "input": json.loads(tc.function.arguments),
22            })
23
24    return result

Gemini (Google)

Strengths: Massive context (2M tokens), native multimodal, Google integration
Best for: Document processing, research tasks, long-context applications

Provider	Context Window	Tool Calling	Best For
Claude Opus 4	200K	Native	Complex reasoning
Claude Sonnet 4	200K	Native	Balanced performance
GPT-4o	128K	Native	Multimodal, speed
o3	200K	Native	Extended reasoning
Gemini 1.5 Pro	2M	Native	Long documents

Agent Prompt Engineering

Agent prompts differ from chatbot prompts. They must guide decision-making, not just generate responses:

System Prompt Structure

🐍agent_system_prompt.py

1AGENT_SYSTEM_PROMPT = """
2You are an AI agent that accomplishes goals by taking actions.
3
4## Your Capabilities
5You have access to these tools:
6{tool_descriptions}
7
8## Decision Framework
9When deciding your next action, consider:
101. What information do I need?
112. What actions can move me toward the goal?
123. Have I verified my assumptions?
134. Is the goal complete?
14
15## Guidelines
16- Always explain your reasoning before acting
17- Use tools to verify information, don't guess
18- If stuck, try a different approach
19- Ask for clarification when requirements are ambiguous
20- Finish when the goal is definitively complete
21
22## Error Handling
23- If a tool fails, analyze the error and try an alternative
24- If stuck after 3 attempts, explain the blocker and ask for help
25- Never make destructive changes without explicit confirmation
26"""

Context Prompt Structure

🐍context_prompt.py

1def build_context_prompt(state: AgentState) -> str:
2    return f"""
3## Goal
4{state.goal}
5
6## Progress
7Steps completed: {len(state.completed_steps)}/{len(state.plan.steps)}
8Current step: {state.current_step.description if state.current_step else "None"}
9
10## Recent Actions
11{format_recent_actions(state.recent_actions[-5:])}
12
13## Relevant Context
14{format_memories(state.memories)}
15
16## Current Situation
17{state.environment_summary}
18
19## Your Task
20Based on the above context, decide your next action.
21First explain your reasoning, then specify the tool to use.
22"""

Prompt Engineering for Agents

Agent prompts should be more structured than conversational prompts. Use clear sections, explicit guidelines, and consistent formatting to help the LLM make better decisions.

Native Tool Calling

Modern LLMs support native tool/function calling, eliminating the need for fragile prompt-based parsing:

🐍tool_schema.py

1# Tool schema for LLM function calling
2read_file_tool = {
3    "name": "read_file",
4    "description": "Read the contents of a file at the given path",
5    "input_schema": {
6        "type": "object",
7        "properties": {
8            "file_path": {
9                "type": "string",
10                "description": "The absolute or relative path to the file",
11            },
12        },
13        "required": ["file_path"],
14    },
15}
16
17search_web_tool = {
18    "name": "search_web",
19    "description": "Search the web for information",
20    "input_schema": {
21        "type": "object",
22        "properties": {
23            "query": {
24                "type": "string",
25                "description": "The search query",
26            },
27            "num_results": {
28                "type": "integer",
29                "description": "Number of results to return (default: 5)",
30                "default": 5,
31            },
32        },
33        "required": ["query"],
34    },
35}

Processing Tool Calls

🐍process_tool_calls.py

1def process_tool_response(response: dict, tools: ToolRegistry) -> list[dict]:
2    """Process tool calls from LLM response."""
3    results = []
4
5    for tool_call in response.get("tool_calls", []):
6        tool_name = tool_call["name"]
7        tool_input = tool_call["input"]
8        tool_id = tool_call["id"]
9
10        # Execute the tool
11        tool_result = tools.execute(tool_name, **tool_input)
12
13        results.append({
14            "tool_use_id": tool_id,
15            "type": "tool_result",
16            "content": tool_result.output if tool_result.success else f"Error: {tool_result.error}",
17        })
18
19    return results
20
21
22def continue_with_results(
23    messages: list[dict],
24    tool_results: list[dict],
25) -> dict:
26    """Continue conversation with tool results."""
27
28    # Add tool results to messages
29    messages.append({"role": "user", "content": tool_results})
30
31    response = client.messages.create(
32        model="claude-sonnet-4-20250514",
33        messages=messages,
34        tools=tools,
35    )
36
37    return parse_response(response)

Context Window Management

Agents quickly accumulate context. Managing what fits in the window is critical:

Strategies for Context Management

Summarization: Compress old context into summaries
Truncation: Remove least relevant older entries
Retrieval: Only include relevant memories via RAG
Sliding window: Keep only the most recent N interactions

🐍context_management.py

1class ContextManager:
2    """Manages context window for agent."""
3
4    def __init__(
5        self,
6        max_tokens: int = 100000,
7        summary_threshold: int = 50000,
8    ):
9        self.max_tokens = max_tokens
10        self.summary_threshold = summary_threshold
11
12    def prepare_context(
13        self,
14        messages: list[dict],
15        memories: list[str],
16        current_task: str,
17    ) -> list[dict]:
18        """Prepare context that fits within token limits."""
19
20        # Estimate current token count
21        current_tokens = self.estimate_tokens(messages)
22
23        # If under threshold, use all context
24        if current_tokens < self.summary_threshold:
25            return self._build_full_context(messages, memories, current_task)
26
27        # Otherwise, summarize older messages
28        return self._build_summarized_context(
29            messages, memories, current_task
30        )
31
32    def _build_summarized_context(
33        self,
34        messages: list[dict],
35        memories: list[str],
36        current_task: str,
37    ) -> list[dict]:
38        """Build context with summarized history."""
39
40        # Keep recent messages
41        recent = messages[-10:]
42
43        # Summarize older messages
44        older = messages[:-10]
45        summary = self._summarize_messages(older)
46
47        # Build new context
48        return [
49            {"role": "user", "content": f"## Previous Context\n{summary}"},
50            *recent,
51            {"role": "user", "content": f"## Current Task\n{current_task}"},
52        ]
53
54    def _summarize_messages(self, messages: list[dict]) -> str:
55        """Create a summary of messages."""
56        # Use LLM to summarize
57        prompt = f"Summarize these agent actions concisely:\n{messages}"
58        response = summarizer_llm.generate(prompt)
59        return response.text

Token Costs Add Up

Long agent sessions can consume millions of tokens. Implement summarization early and monitor your token usage to avoid surprise bills.

Summary

The LLM as reasoning engine:

Role: Decision-making, parameter generation, synthesis, reflection
Providers: Claude, GPT-4/o3, Gemini - each with different strengths
Prompts: Structure for decisions, not just responses
Tool Calling: Native structured outputs for reliable execution
Context: Manage carefully to stay within limits

Next Up: With the reasoning engine understood, let's explore how to build the tools that give your agent hands to act on the world.