Boo-AI — Master Artificial Intelligence by Building from Scratch

Introduction

One of Claude Code's most powerful features is its ability to automatically find relevant context in your codebase. This "agentic search" capability lets it work on large codebases without you specifying exactly which files to look at.

The Magic: You say "fix the authentication bug" and Claude Code finds the auth files, the related tests, the config, and the relevant documentation. How? Agentic search.

The Context Problem

Coding agents face a fundamental challenge: codebases are too large to fit in context.

Codebase Size	Approximate Tokens	Context Fit?
Small script (100 lines)	~500 tokens	Easily fits
Small project (10 files)	~10K tokens	Fits in most models
Medium project (100 files)	~100K tokens	Barely fits Claude
Large project (1000 files)	~1M+ tokens	Doesn't fit
Enterprise codebase	~10M+ tokens	Way too large

The solution isn't to load everything. It's to intelligently find what's relevant.

What Makes Context Relevant?

Directly related: Files you'll read or modify
Dependencies: Code that the target files depend on
Dependents: Code that depends on target files
Similar patterns: Code that shows how things are done
Configuration: Settings that affect behavior
Tests: Existing tests that should keep passing

How Agentic Search Works

Agentic search is iterative. The agent searches, reads results, decides what else to look for, and repeats:

🐍agentic_search.py

1class AgenticSearch:
2    """Iteratively search for relevant context."""
3
4    def search(self, goal: str, max_rounds: int = 5) -> list[str]:
5        """Find relevant files for a goal."""
6        found_files = set()
7        explored_queries = set()
8
9        for round in range(max_rounds):
10            # Generate search queries based on goal and current context
11            queries = self.generate_queries(goal, found_files)
12
13            # Remove already-explored queries
14            new_queries = [q for q in queries if q not in explored_queries]
15            if not new_queries:
16                break
17
18            # Execute searches
19            for query in new_queries:
20                results = self.execute_search(query)
21                found_files.update(results)
22                explored_queries.add(query)
23
24            # Check if we have enough context
25            if self.sufficient_context(goal, found_files):
26                break
27
28        return list(found_files)
29
30    def generate_queries(
31        self,
32        goal: str,
33        current_files: set[str]
34    ) -> list[str]:
35        """Use LLM to generate relevant search queries."""
36        prompt = f"""
37Goal: {goal}
38Files found so far: {list(current_files)[:10]}
39
40Generate 3-5 search queries to find more relevant files.
41Consider:
42- File names and patterns
43- Function/class names
44- Import statements
45- Configuration files
46- Test files
47"""
48        response = self.llm.generate(prompt)
49        return self.parse_queries(response.text)

The Search Loop

📝search_loop.txt

1Goal: "Fix the authentication bug"
2
3Round 1:
4  Query: "auth" → Found: src/auth/login.ts, src/auth/session.ts
5  Query: "authenticate" → Found: src/middleware/auth.ts
6  Query: "*.spec.ts auth" → Found: tests/auth.spec.ts
7
8Round 2 (informed by what we found):
9  Query: "import from auth" → Found: src/api/routes.ts
10  Query: "Session type" → Found: src/types/auth.d.ts
11  Query: "AUTH_SECRET" → Found: config/env.ts
12
13Round 3:
14  Query: "jwt token" → Found: src/auth/token.ts
15  Sufficient context detected. Stopping.
16
17Final context: 8 files, ~2000 tokens

Implementation Patterns

Pattern 1: Glob-Based Search

🐍glob_search.py

1import fnmatch
2from pathlib import Path
3
4class GlobSearch:
5    """Fast file pattern matching."""
6
7    def search(self, pattern: str, root: Path = Path(".")) -> list[Path]:
8        """Find files matching a glob pattern."""
9        results = []
10
11        for path in root.rglob("*"):
12            if path.is_file():
13                if fnmatch.fnmatch(path.name, pattern):
14                    results.append(path)
15                elif fnmatch.fnmatch(str(path), pattern):
16                    results.append(path)
17
18        return sorted(results, key=lambda p: p.stat().st_mtime, reverse=True)
19
20# Usage
21glob = GlobSearch()
22auth_files = glob.search("*auth*.ts")
23test_files = glob.search("*.spec.ts")
24config_files = glob.search("config/*.json")

Pattern 2: Grep-Based Search

🐍grep_search.py

1import subprocess
2from dataclasses import dataclass
3
4@dataclass
5class GrepResult:
6    file: str
7    line_number: int
8    content: str
9    context_before: list[str]
10    context_after: list[str]
11
12
13class GrepSearch:
14    """Search file contents using ripgrep."""
15
16    def search(
17        self,
18        pattern: str,
19        file_types: list[str] | None = None,
20        context_lines: int = 2,
21    ) -> list[GrepResult]:
22        """Search for pattern in files."""
23
24        cmd = ["rg", "--json", "-C", str(context_lines)]
25
26        if file_types:
27            for ft in file_types:
28                cmd.extend(["-t", ft])
29
30        cmd.append(pattern)
31
32        result = subprocess.run(cmd, capture_output=True, text=True)
33        return self.parse_results(result.stdout)
34
35# Usage
36grep = GrepSearch()
37auth_refs = grep.search("authenticateUser", file_types=["ts", "tsx"])
38imports = grep.search("from ['"]@/auth", file_types=["ts"])

Pattern 3: Semantic Search

🐍semantic_search.py

1class SemanticSearch:
2    """Search using embeddings for semantic similarity."""
3
4    def __init__(self, index_path: str):
5        self.index = self.load_index(index_path)
6
7    def search(self, query: str, k: int = 10) -> list[dict]:
8        """Find semantically similar code chunks."""
9
10        # Embed the query
11        query_embedding = self.embedder.embed(query)
12
13        # Search the index
14        results = self.index.search(query_embedding, k=k)
15
16        return [
17            {
18                "file": r.file_path,
19                "chunk": r.content,
20                "score": r.similarity,
21            }
22            for r in results
23        ]
24
25# Usage
26semantic = SemanticSearch("./code_index")
27results = semantic.search("user authentication flow")
28# Returns code chunks that are semantically related,
29# even if they don't contain the exact words

Combine Search Types

The best agentic search combines all three: glob for file patterns, grep for exact matches, and semantic for conceptual similarity.

Context Ranking

Not all found files are equally important. Ranking helps prioritize:

🐍context_ranking.py

1class ContextRanker:
2    """Rank files by relevance to the goal."""
3
4    def rank(self, files: list[str], goal: str) -> list[tuple[str, float]]:
5        """Return files sorted by relevance score."""
6        scored = []
7
8        for file in files:
9            score = self.calculate_score(file, goal)
10            scored.append((file, score))
11
12        return sorted(scored, key=lambda x: x[1], reverse=True)
13
14    def calculate_score(self, file: str, goal: str) -> float:
15        """Calculate relevance score for a file."""
16        score = 0.0
17
18        # Recency: recently modified files are more relevant
19        mtime = Path(file).stat().st_mtime
20        recency = 1.0 / (1 + (time.time() - mtime) / 86400)  # Decay over days
21        score += recency * 0.2
22
23        # Name match: file name matches goal terms
24        goal_terms = set(goal.lower().split())
25        name_terms = set(Path(file).stem.lower().split("_"))
26        name_match = len(goal_terms & name_terms) / len(goal_terms)
27        score += name_match * 0.3
28
29        # Content relevance: terms appear in content
30        content = Path(file).read_text()
31        content_match = sum(1 for term in goal_terms if term in content.lower())
32        score += (content_match / len(goal_terms)) * 0.3
33
34        # Type bonus: certain file types are more important
35        if file.endswith(".test.ts") or file.endswith(".spec.ts"):
36            score += 0.1  # Tests are important
37        if "config" in file.lower():
38            score += 0.1  # Config files matter
39
40        return min(score, 1.0)  # Cap at 1.0

Context Budget

Use ranking to stay within token limits:

🐍context_budget.py

1def select_within_budget(
2    files: list[tuple[str, float]],
3    max_tokens: int = 50000,
4) -> list[str]:
5    """Select highest-ranked files that fit in token budget."""
6    selected = []
7    current_tokens = 0
8
9    for file, score in files:
10        file_tokens = estimate_tokens(Path(file).read_text())
11
12        if current_tokens + file_tokens <= max_tokens:
13            selected.append(file)
14            current_tokens += file_tokens
15        else:
16            # Try to fit a summary instead
17            if current_tokens + 200 <= max_tokens:
18                selected.append(summarize_file(file))
19                current_tokens += 200
20
21    return selected

Practical Tips

1. Start Broad, Then Focus

🐍broad_to_focused.py

1# Round 1: Broad search
2files = search("auth")  # Many results
3
4# Round 2: Focus on specific area
5files = search("login validation", in_files=files)  # Narrower
6
7# Round 3: Find tests for these files
8test_files = find_tests_for(files)

2. Follow Imports

🐍follow_imports.py

1def find_related_by_imports(file: str) -> set[str]:
2    """Find files related through imports."""
3    related = set()
4
5    # Find files this file imports
6    imports = extract_imports(file)
7    for imp in imports:
8        related.add(resolve_import(imp))
9
10    # Find files that import this file
11    importers = grep(f"from ['"].*{Path(file).stem}")
12    related.update(importers)
13
14    return related

3. Don't Forget Configuration

🐍find_config.py

1def find_related_config(files: list[str]) -> list[str]:
2    """Find configuration files that might affect these files."""
3    config_files = []
4
5    # Common config file patterns
6    patterns = [
7        "*.config.js",
8        "*.config.ts",
9        ".env*",
10        "tsconfig*.json",
11        "package.json",
12    ]
13
14    for pattern in patterns:
15        config_files.extend(glob(pattern))
16
17    return config_files

Avoid Context Overload

More context isn't always better. Too much irrelevant context can confuse the LLM. Aim for quality over quantity.

Summary

Agentic search is essential for working with real codebases:

Iterative: Search → Read → Generate new queries → Repeat
Multi-modal: Combine glob, grep, and semantic search
Ranked: Prioritize files by relevance score
Budgeted: Stay within token limits
Complete: Include tests, config, and related files

Next: With context gathered, let's explore the tool system that lets Claude Code act on your codebase.