Introduction
One of Claude Code's most powerful features is its ability to automatically find relevant context in your codebase. This "agentic search" capability lets it work on large codebases without you specifying exactly which files to look at.
The Magic: You say "fix the authentication bug" and Claude Code finds the auth files, the related tests, the config, and the relevant documentation. How? Agentic search.
The Context Problem
Coding agents face a fundamental challenge: codebases are too large to fit in context.
| Codebase Size | Approximate Tokens | Context Fit? |
|---|---|---|
| Small script (100 lines) | ~500 tokens | Easily fits |
| Small project (10 files) | ~10K tokens | Fits in most models |
| Medium project (100 files) | ~100K tokens | Barely fits Claude |
| Large project (1000 files) | ~1M+ tokens | Doesn't fit |
| Enterprise codebase | ~10M+ tokens | Way too large |
The solution isn't to load everything. It's to intelligently find what's relevant.
What Makes Context Relevant?
- Directly related: Files you'll read or modify
- Dependencies: Code that the target files depend on
- Dependents: Code that depends on target files
- Similar patterns: Code that shows how things are done
- Configuration: Settings that affect behavior
- Tests: Existing tests that should keep passing
How Agentic Search Works
Agentic search is iterative. The agent searches, reads results, decides what else to look for, and repeats:
🐍agentic_search.py
1class AgenticSearch:
2 """Iteratively search for relevant context."""
3
4 def search(self, goal: str, max_rounds: int = 5) -> list[str]:
5 """Find relevant files for a goal."""
6 found_files = set()
7 explored_queries = set()
8
9 for round in range(max_rounds):
10 # Generate search queries based on goal and current context
11 queries = self.generate_queries(goal, found_files)
12
13 # Remove already-explored queries
14 new_queries = [q for q in queries if q not in explored_queries]
15 if not new_queries:
16 break
17
18 # Execute searches
19 for query in new_queries:
20 results = self.execute_search(query)
21 found_files.update(results)
22 explored_queries.add(query)
23
24 # Check if we have enough context
25 if self.sufficient_context(goal, found_files):
26 break
27
28 return list(found_files)
29
30 def generate_queries(
31 self,
32 goal: str,
33 current_files: set[str]
34 ) -> list[str]:
35 """Use LLM to generate relevant search queries."""
36 prompt = f"""
37Goal: {goal}
38Files found so far: {list(current_files)[:10]}
39
40Generate 3-5 search queries to find more relevant files.
41Consider:
42- File names and patterns
43- Function/class names
44- Import statements
45- Configuration files
46- Test files
47"""
48 response = self.llm.generate(prompt)
49 return self.parse_queries(response.text)The Search Loop
📝search_loop.txt
1Goal: "Fix the authentication bug"
2
3Round 1:
4 Query: "auth" → Found: src/auth/login.ts, src/auth/session.ts
5 Query: "authenticate" → Found: src/middleware/auth.ts
6 Query: "*.spec.ts auth" → Found: tests/auth.spec.ts
7
8Round 2 (informed by what we found):
9 Query: "import from auth" → Found: src/api/routes.ts
10 Query: "Session type" → Found: src/types/auth.d.ts
11 Query: "AUTH_SECRET" → Found: config/env.ts
12
13Round 3:
14 Query: "jwt token" → Found: src/auth/token.ts
15 Sufficient context detected. Stopping.
16
17Final context: 8 files, ~2000 tokensImplementation Patterns
Pattern 1: Glob-Based Search
🐍glob_search.py
1import fnmatch
2from pathlib import Path
3
4class GlobSearch:
5 """Fast file pattern matching."""
6
7 def search(self, pattern: str, root: Path = Path(".")) -> list[Path]:
8 """Find files matching a glob pattern."""
9 results = []
10
11 for path in root.rglob("*"):
12 if path.is_file():
13 if fnmatch.fnmatch(path.name, pattern):
14 results.append(path)
15 elif fnmatch.fnmatch(str(path), pattern):
16 results.append(path)
17
18 return sorted(results, key=lambda p: p.stat().st_mtime, reverse=True)
19
20# Usage
21glob = GlobSearch()
22auth_files = glob.search("*auth*.ts")
23test_files = glob.search("*.spec.ts")
24config_files = glob.search("config/*.json")Pattern 2: Grep-Based Search
🐍grep_search.py
1import subprocess
2from dataclasses import dataclass
3
4@dataclass
5class GrepResult:
6 file: str
7 line_number: int
8 content: str
9 context_before: list[str]
10 context_after: list[str]
11
12
13class GrepSearch:
14 """Search file contents using ripgrep."""
15
16 def search(
17 self,
18 pattern: str,
19 file_types: list[str] | None = None,
20 context_lines: int = 2,
21 ) -> list[GrepResult]:
22 """Search for pattern in files."""
23
24 cmd = ["rg", "--json", "-C", str(context_lines)]
25
26 if file_types:
27 for ft in file_types:
28 cmd.extend(["-t", ft])
29
30 cmd.append(pattern)
31
32 result = subprocess.run(cmd, capture_output=True, text=True)
33 return self.parse_results(result.stdout)
34
35# Usage
36grep = GrepSearch()
37auth_refs = grep.search("authenticateUser", file_types=["ts", "tsx"])
38imports = grep.search("from ['"]@/auth", file_types=["ts"])Pattern 3: Semantic Search
🐍semantic_search.py
1class SemanticSearch:
2 """Search using embeddings for semantic similarity."""
3
4 def __init__(self, index_path: str):
5 self.index = self.load_index(index_path)
6
7 def search(self, query: str, k: int = 10) -> list[dict]:
8 """Find semantically similar code chunks."""
9
10 # Embed the query
11 query_embedding = self.embedder.embed(query)
12
13 # Search the index
14 results = self.index.search(query_embedding, k=k)
15
16 return [
17 {
18 "file": r.file_path,
19 "chunk": r.content,
20 "score": r.similarity,
21 }
22 for r in results
23 ]
24
25# Usage
26semantic = SemanticSearch("./code_index")
27results = semantic.search("user authentication flow")
28# Returns code chunks that are semantically related,
29# even if they don't contain the exact wordsCombine Search Types
The best agentic search combines all three: glob for file patterns, grep for exact matches, and semantic for conceptual similarity.
Context Ranking
Not all found files are equally important. Ranking helps prioritize:
🐍context_ranking.py
1class ContextRanker:
2 """Rank files by relevance to the goal."""
3
4 def rank(self, files: list[str], goal: str) -> list[tuple[str, float]]:
5 """Return files sorted by relevance score."""
6 scored = []
7
8 for file in files:
9 score = self.calculate_score(file, goal)
10 scored.append((file, score))
11
12 return sorted(scored, key=lambda x: x[1], reverse=True)
13
14 def calculate_score(self, file: str, goal: str) -> float:
15 """Calculate relevance score for a file."""
16 score = 0.0
17
18 # Recency: recently modified files are more relevant
19 mtime = Path(file).stat().st_mtime
20 recency = 1.0 / (1 + (time.time() - mtime) / 86400) # Decay over days
21 score += recency * 0.2
22
23 # Name match: file name matches goal terms
24 goal_terms = set(goal.lower().split())
25 name_terms = set(Path(file).stem.lower().split("_"))
26 name_match = len(goal_terms & name_terms) / len(goal_terms)
27 score += name_match * 0.3
28
29 # Content relevance: terms appear in content
30 content = Path(file).read_text()
31 content_match = sum(1 for term in goal_terms if term in content.lower())
32 score += (content_match / len(goal_terms)) * 0.3
33
34 # Type bonus: certain file types are more important
35 if file.endswith(".test.ts") or file.endswith(".spec.ts"):
36 score += 0.1 # Tests are important
37 if "config" in file.lower():
38 score += 0.1 # Config files matter
39
40 return min(score, 1.0) # Cap at 1.0Context Budget
Use ranking to stay within token limits:
🐍context_budget.py
1def select_within_budget(
2 files: list[tuple[str, float]],
3 max_tokens: int = 50000,
4) -> list[str]:
5 """Select highest-ranked files that fit in token budget."""
6 selected = []
7 current_tokens = 0
8
9 for file, score in files:
10 file_tokens = estimate_tokens(Path(file).read_text())
11
12 if current_tokens + file_tokens <= max_tokens:
13 selected.append(file)
14 current_tokens += file_tokens
15 else:
16 # Try to fit a summary instead
17 if current_tokens + 200 <= max_tokens:
18 selected.append(summarize_file(file))
19 current_tokens += 200
20
21 return selectedPractical Tips
1. Start Broad, Then Focus
🐍broad_to_focused.py
1# Round 1: Broad search
2files = search("auth") # Many results
3
4# Round 2: Focus on specific area
5files = search("login validation", in_files=files) # Narrower
6
7# Round 3: Find tests for these files
8test_files = find_tests_for(files)2. Follow Imports
🐍follow_imports.py
1def find_related_by_imports(file: str) -> set[str]:
2 """Find files related through imports."""
3 related = set()
4
5 # Find files this file imports
6 imports = extract_imports(file)
7 for imp in imports:
8 related.add(resolve_import(imp))
9
10 # Find files that import this file
11 importers = grep(f"from ['"].*{Path(file).stem}")
12 related.update(importers)
13
14 return related3. Don't Forget Configuration
🐍find_config.py
1def find_related_config(files: list[str]) -> list[str]:
2 """Find configuration files that might affect these files."""
3 config_files = []
4
5 # Common config file patterns
6 patterns = [
7 "*.config.js",
8 "*.config.ts",
9 ".env*",
10 "tsconfig*.json",
11 "package.json",
12 ]
13
14 for pattern in patterns:
15 config_files.extend(glob(pattern))
16
17 return config_filesAvoid Context Overload
More context isn't always better. Too much irrelevant context can confuse the LLM. Aim for quality over quantity.
Summary
Agentic search is essential for working with real codebases:
- Iterative: Search → Read → Generate new queries → Repeat
- Multi-modal: Combine glob, grep, and semantic search
- Ranked: Prioritize files by relevance score
- Budgeted: Stay within token limits
- Complete: Include tests, config, and related files
Next: With context gathered, let's explore the tool system that lets Claude Code act on your codebase.