Boo-AI — Master Artificial Intelligence by Building from Scratch

Introduction

The term "AI agent" gets thrown around a lot, often inconsistently. Some call any LLM with tools an agent. Others reserve the term for fully autonomous systems. In this section, we'll establish a clear, practical definition and explore the key components that transform an LLM into a true agent.

Our Working Definition: An AI agent is a system that uses an LLM as its reasoning engine to perceive its environment, make decisions, and take actions to achieve specific goals, adapting its behavior based on feedback.

Defining AI Agents

The Essential Properties

For an AI system to be considered an agent, it must exhibit these core properties:

Property	Description	Example
Goal-Directed	Works toward achieving specific objectives	"Deploy this application to production"
Autonomous	Makes decisions without constant human input	Chooses which files to edit, which tools to use
Adaptive	Adjusts behavior based on feedback and results	Retries with different approach when tests fail
Interactive	Perceives and acts on its environment	Reads files, executes code, makes API calls
Persistent	Continues working across multiple steps	Keeps working until the goal is achieved or abandoned

What an Agent is NOT

To sharpen our definition, let's clarify what doesn't qualify as an agent:

Not just a chatbot - Agents take actions, not just produce text
Not just RAG - Retrieval is a tool agents can use, not what makes them agents
Not just function calling - A single tool call isn't an agent; the loop is essential
Not just a workflow - Agents make decisions; workflows follow fixed paths

The Litmus Test

Ask yourself: "If I give this system a goal, can it autonomously work toward that goal, adapt when things go wrong, and tell me when it's done?" If yes, you have an agent.

The Core Components

Every agent architecture includes these fundamental components:

1. The Reasoning Engine (LLM)

The LLM is the "brain" of the agent. It:

Interprets the current state and goal
Decides which action to take next
Generates plans for complex tasks
Reflects on results and adjusts strategy

🐍reasoning_engine.py

1# The LLM as reasoning engine
2def decide_action(context: str, goal: str, available_tools: list) -> Action:
3    """Use LLM to decide the next action."""
4
5    prompt = f"""
6    Current context: {context}
7    Goal: {goal}
8    Available tools: {[t.name for t in available_tools]}
9
10    Based on the current context and goal, what action should I take next?
11    Choose a tool and specify the parameters.
12    """
13
14    response = llm.generate(prompt)
15    return parse_action(response)  # Returns structured Action object

2. Tools (Actions)

Tools are the agent's interface to the world. They allow the agent to:

Read - Access files, databases, APIs
Write - Create and modify files
Execute - Run code, shell commands
Search - Find information on the web or in codebases
Communicate - Send messages, make API calls

🐍tool_example.py

1from typing import Protocol
2
3class Tool(Protocol):
4    """Interface that all tools must implement."""
5    name: str
6    description: str
7
8    def execute(self, **kwargs) -> str:
9        """Execute the tool and return results."""
10        ...
11
12# Example tools
13class ReadFileTool:
14    name = "read_file"
15    description = "Read the contents of a file at the given path"
16
17    def execute(self, file_path: str) -> str:
18        with open(file_path, 'r') as f:
19            return f.read()
20
21class RunCommandTool:
22    name = "run_command"
23    description = "Execute a shell command and return output"
24
25    def execute(self, command: str) -> str:
26        import subprocess
27        result = subprocess.run(command, shell=True, capture_output=True)
28        return result.stdout.decode() + result.stderr.decode()
29
30class WebSearchTool:
31    name = "web_search"
32    description = "Search the web and return results"
33
34    def execute(self, query: str) -> str:
35        # Implementation using search API
36        ...

3. Memory

Memory allows agents to maintain context across interactions:

Memory Type	Duration	Use Case
Working Memory	Single task	Current conversation, immediate context
Short-term Memory	Session	Recent actions, intermediate results
Long-term Memory	Persistent	Learned patterns, user preferences, past solutions

🐍memory_system.py

1class AgentMemory:
2    def __init__(self):
3        self.working_memory = []      # Current task context
4        self.short_term = []          # Recent actions (last ~100)
5        self.long_term = VectorStore() # Persistent embeddings
6
7    def add(self, entry: str, persist: bool = False):
8        self.working_memory.append(entry)
9        self.short_term.append(entry)
10
11        if persist:
12            embedding = embed(entry)
13            self.long_term.add(embedding, entry)
14
15    def recall(self, query: str, k: int = 5) -> list[str]:
16        """Retrieve relevant memories."""
17        # Search long-term memory using semantic similarity
18        relevant = self.long_term.search(query, k=k)
19
20        # Add recent short-term entries
21        recent = self.short_term[-10:]
22
23        return relevant + recent

4. Planning System

Complex goals require breaking down into achievable steps:

🐍planning_system.py

1def create_plan(goal: str, context: str) -> list[Step]:
2    """Break down a complex goal into steps."""
3
4    prompt = f"""
5    Goal: {goal}
6    Context: {context}
7
8    Break this goal into concrete, achievable steps.
9    For each step, specify:
10    - What needs to be done
11    - What tools might be needed
12    - What success looks like
13    """
14
15    response = llm.generate(prompt)
16    return parse_steps(response)
17
18# Example output for goal: "Add user authentication to the app"
19# [
20#     Step("Research existing auth patterns in codebase", tools=["grep", "read_file"]),
21#     Step("Choose authentication strategy", tools=["web_search"]),
22#     Step("Implement login endpoint", tools=["edit_file"]),
23#     Step("Implement registration endpoint", tools=["edit_file"]),
24#     Step("Add session management", tools=["edit_file"]),
25#     Step("Write tests", tools=["edit_file", "run_command"]),
26#     Step("Run tests and fix issues", tools=["run_command", "edit_file"]),
27# ]

5. Observation and Feedback

Agents must process the results of their actions:

🐍observation.py

1def process_observation(action: Action, result: str) -> Observation:
2    """Process the result of an action."""
3
4    observation = {
5        "action_taken": action.name,
6        "parameters": action.params,
7        "result": result,
8        "success": determine_success(result),
9        "insights": extract_insights(result),
10    }
11
12    # Did we encounter an error?
13    if "error" in result.lower() or "failed" in result.lower():
14        observation["needs_retry"] = True
15        observation["suggested_fix"] = analyze_error(result)
16
17    return observation

The Agent Loop in Detail

The agent loop is the heartbeat of any agentic system. Here's a complete implementation:

🐍complete_agent_loop.py

1class Agent:
2    def __init__(self, tools: list[Tool], llm: LLM):
3        self.tools = {t.name: t for t in tools}
4        self.llm = llm
5        self.memory = AgentMemory()
6        self.max_iterations = 50
7
8    def run(self, goal: str) -> AgentResult:
9        """Main agent loop."""
10
11        # Initialize state
12        state = AgentState(goal=goal)
13        self.memory.add(f"Goal: {goal}")
14
15        for iteration in range(self.max_iterations):
16            # 1. PERCEIVE: Gather current context
17            context = self.gather_context(state)
18
19            # 2. REASON: Decide next action
20            action = self.decide_action(context, goal)
21
22            # Check for completion
23            if action.type == "finish":
24                return AgentResult(
25                    success=True,
26                    output=action.params.get("result"),
27                    iterations=iteration
28                )
29
30            # 3. ACT: Execute the action
31            self.memory.add(f"Action: {action.name}({action.params})")
32
33            try:
34                result = self.execute_action(action)
35                self.memory.add(f"Result: {result[:500]}")  # Truncate long results
36            except Exception as e:
37                result = f"Error: {str(e)}"
38                self.memory.add(f"Error: {str(e)}")
39
40            # 4. OBSERVE: Update state with results
41            observation = self.process_observation(action, result)
42            state.update(observation)
43
44            # 5. REFLECT: Check if we need to replan
45            if observation.get("needs_replan"):
46                new_plan = self.create_plan(goal, state.summary())
47                state.update_plan(new_plan)
48
49        # Max iterations reached
50        return AgentResult(
51            success=False,
52            output="Max iterations reached without completing goal",
53            iterations=self.max_iterations
54        )
55
56    def gather_context(self, state: AgentState) -> str:
57        """Compile relevant context for decision making."""
58        return f"""
59        Goal: {state.goal}
60        Progress: {state.progress_summary()}
61        Recent actions: {state.recent_actions(5)}
62        Current plan: {state.current_plan}
63        Relevant memories: {self.memory.recall(state.goal)}
64        """
65
66    def decide_action(self, context: str, goal: str) -> Action:
67        """Use LLM to decide the next action."""
68        tool_descriptions = [
69            f"{name}: {tool.description}"
70            for name, tool in self.tools.items()
71        ]
72
73        prompt = f"""
74        {context}
75
76        Available tools:
77        {chr(10).join(tool_descriptions)}
78
79        Special action:
80        - finish: Complete the task with a result
81
82        What should I do next to achieve the goal?
83        Respond with the tool name and parameters.
84        """
85
86        response = self.llm.generate(prompt)
87        return parse_action(response)
88
89    def execute_action(self, action: Action) -> str:
90        """Execute the chosen action."""
91        tool = self.tools.get(action.name)
92        if tool is None:
93            raise ValueError(f"Unknown tool: {action.name}")
94        return tool.execute(**action.params)

The Loop is Everything

This loop - perceive, reason, act, observe, reflect - is the fundamental pattern of all agents. Every agent framework (LangGraph, CrewAI, AutoGPT) implements some variation of this loop.

The Autonomy Spectrum

Agents exist on a spectrum of autonomy:

Level	Description	Human Involvement	Example
1: Copilot	Suggests actions, human approves	Every action	GitHub Copilot suggestions
2: Supervised	Executes safe actions, asks for risky ones	Important decisions	Claude Code with permission prompts
3: Autonomous	Executes all actions, reports results	Review after completion	Devin on standard tasks
4: Self-Directed	Sets own goals based on high-level objectives	Periodic check-ins	AutoGPT with a mission

Most Production Agents are Level 2-3

Fully autonomous agents (Level 4) are risky in production. Most practical deployments use supervised or autonomous modes with human oversight for critical actions.

Types of Agents

By Application Domain

Type	Primary Task	Key Tools	Examples
Coding Agent	Write and modify code	File ops, code execution, Git	Claude Code, Cursor, Codex
Research Agent	Find and synthesize information	Web search, document reading	Perplexity, custom research bots
Data Agent	Analyze and transform data	SQL, Python, visualization	Data analysis assistants
Browser Agent	Navigate and interact with websites	Browser control, screenshots	Browserbase agents
Assistant Agent	Handle diverse tasks	Multiple tool categories	General-purpose assistants

By Architecture

Single Agent - One LLM handling the entire task
Multi-Agent - Multiple specialized agents collaborating
Hierarchical - Manager agent coordinating worker agents
Swarm - Many simple agents working in parallel

Summary

An AI agent is more than just an LLM with tools. It's a complete system with:

Reasoning Engine - The LLM that makes decisions
Tools - Capabilities to interact with the world
Memory - Context and learning across interactions
Planning - Ability to break down complex goals
Feedback Loops - Observation and adaptation

Remember: The agent loop - perceive, reason, act, observe, reflect - is the pattern that transforms an LLM from a text generator into an autonomous problem solver.

In the next section, we'll survey the current landscape of AI agents, from research prototypes to production systems.