Boo-AI — Master Artificial Intelligence by Building from Scratch

Introduction

Vector databases excel at finding semantically similar content, but they struggle with structured relationships. "Who reports to Alice?" or "What projects depend on Service X?" require traversing explicit relationships that vector similarity doesn't capture. Knowledge graphs store information as entities and their relationships, enabling precise structured queries alongside semantic search.

Graphs vs Vectors: Vector search finds "content like this"; graph queries find "entities connected like that". Many applications need both.

Knowledge Graph Fundamentals

Knowledge graphs represent information as nodes (entities) and edges (relationships):

📝graph_structure.txt

1KNOWLEDGE GRAPH STRUCTURE
2
3        ┌─────────────────────────────────────────────────┐
4        │              NODES (Entities)                   │
5        │                                                 │
6        │   [Alice]──────works_at──────▶[Acme Corp]      │
7        │      │                             │            │
8        │      │                             │            │
9        │   manages                      located_in       │
10        │      │                             │            │
11        │      ▼                             ▼            │
12        │   [Bob]                        [New York]       │
13        │      │                                          │
14        │   works_on                                      │
15        │      │                                          │
16        │      ▼                                          │
17        │  [Project X]───depends_on───▶[Service Y]       │
18        │                                                 │
19        └─────────────────────────────────────────────────┘
20
21TRIPLE FORMAT: (subject, predicate, object)
22- (Alice, works_at, Acme Corp)
23- (Alice, manages, Bob)
24- (Bob, works_on, Project X)
25- (Project X, depends_on, Service Y)
26- (Acme Corp, located_in, New York)
27
28QUERIES THIS ENABLES:
29- "Who manages Bob?" → Alice
30- "What projects depend on Service Y?" → Project X
31- "Who works at companies in New York?" → Alice, Bob

Core Concepts

Concept	Description	Example
Node/Entity	A thing or concept	Person, Company, Project
Edge/Relationship	Connection between nodes	works_at, manages, depends_on
Property	Attribute of a node or edge	name, date, importance
Triple	(subject, predicate, object)	(Alice, manages, Bob)
Path	Sequence of connected edges	Alice → manages → Bob → works_on → Project X

Why Graphs for Agent Memory

🐍graph_advantages.py

1# What graphs enable that vectors can't easily do:
2
3# 1. Multi-hop reasoning
4query = "Who manages people working on Project X?"
5# Graph: Follow works_on edges back, then manages edges
6# Vector: Would need to know exact phrasing
7
8# 2. Relationship-based retrieval
9query = "Get all dependencies of dependencies of Service A"
10# Graph: Traverse depends_on edges recursively
11# Vector: Semantic similarity doesn't capture transitivity
12
13# 3. Constrained search
14query = "Find senior engineers who worked on AI projects in 2023"
15# Graph: Filter by role, project type, and date
16# Vector: Hard to combine multiple exact constraints
17
18# 4. Context-aware answers
19question = "Should we deprecate Service Y?"
20# Graph: Find all dependents, their importance, who owns them
21# Provides structured context for informed answer

Building Knowledge Graphs

Constructing a knowledge graph from conversations and documents:

Entity and Relation Extraction

🐍graph_extraction.py

1from dataclasses import dataclass
2from typing import Optional
3import json
4
5@dataclass
6class Node:
7    id: str
8    type: str
9    name: str
10    properties: dict
11
12@dataclass
13class Edge:
14    source_id: str
15    target_id: str
16    relation: str
17    properties: dict
18
19class KnowledgeGraphBuilder:
20    """Extract and build knowledge graphs from text."""
21
22    def __init__(self, llm):
23        self.llm = llm
24        self.nodes: dict[str, Node] = {}
25        self.edges: list[Edge] = []
26
27    async def extract_from_text(self, text: str) -> tuple[list[Node], list[Edge]]:
28        """Extract entities and relationships from text."""
29
30        prompt = f"""Extract entities and relationships from this text.
31
32TEXT:
33{text}
34
35Return JSON with this structure:
36{{
37    "entities": [
38        {{"id": "unique_id", "type": "person|org|project|concept|location",
39          "name": "display name", "properties": {{}}}}
40    ],
41    "relationships": [
42        {{"source": "entity_id", "target": "entity_id",
43          "relation": "relationship_type", "properties": {{}}}}
44    ]
45}}
46
47Common relationship types:
48- works_at, works_on, manages, reports_to
49- depends_on, uses, created_by, owns
50- located_in, part_of, related_to
51
52Only extract clear, factual relationships."""
53
54        response = await self.llm.generate(prompt)
55        data = json.loads(response)
56
57        nodes = [
58            Node(
59                id=e["id"],
60                type=e["type"],
61                name=e["name"],
62                properties=e.get("properties", {})
63            )
64            for e in data["entities"]
65        ]
66
67        edges = [
68            Edge(
69                source_id=r["source"],
70                target_id=r["target"],
71                relation=r["relation"],
72                properties=r.get("properties", {})
73            )
74            for r in data["relationships"]
75        ]
76
77        return nodes, edges
78
79    def add_to_graph(self, nodes: list[Node], edges: list[Edge]) -> None:
80        """Add extracted elements to the graph."""
81        for node in nodes:
82            # Merge if node already exists
83            if node.id in self.nodes:
84                existing = self.nodes[node.id]
85                existing.properties.update(node.properties)
86            else:
87                self.nodes[node.id] = node
88
89        self.edges.extend(edges)
90
91    async def process_conversation(
92        self,
93        messages: list[dict]
94    ) -> None:
95        """Extract knowledge from a conversation."""
96        # Combine messages into text
97        text = "\n".join([
98            f"{m['role']}: {m['content']}"
99            for m in messages
100        ])
101
102        nodes, edges = await self.extract_from_text(text)
103        self.add_to_graph(nodes, edges)

Graph Storage with Neo4j

🐍neo4j_storage.py

1from neo4j import GraphDatabase
2
3class Neo4jKnowledgeGraph:
4    """Store knowledge graph in Neo4j."""
5
6    def __init__(self, uri: str, user: str, password: str):
7        self.driver = GraphDatabase.driver(uri, auth=(user, password))
8
9    def close(self):
10        self.driver.close()
11
12    def add_node(self, node: Node) -> None:
13        with self.driver.session() as session:
14            session.run(
15                f"""
16                MERGE (n:{node.type} {{id: $id}})
17                SET n.name = $name
18                SET n += $properties
19                """,
20                id=node.id,
21                name=node.name,
22                properties=node.properties
23            )
24
25    def add_edge(self, edge: Edge) -> None:
26        with self.driver.session() as session:
27            session.run(
28                f"""
29                MATCH (a {{id: $source_id}})
30                MATCH (b {{id: $target_id}})
31                MERGE (a)-[r:{edge.relation}]->(b)
32                SET r += $properties
33                """,
34                source_id=edge.source_id,
35                target_id=edge.target_id,
36                properties=edge.properties
37            )
38
39    def query(self, cypher: str, params: dict = None) -> list[dict]:
40        with self.driver.session() as session:
41            result = session.run(cypher, params or {})
42            return [record.data() for record in result]
43
44    def find_related(
45        self,
46        entity_id: str,
47        relation: str = None,
48        max_hops: int = 2
49    ) -> list[dict]:
50        """Find entities related to the given entity."""
51        if relation:
52            cypher = f"""
53                MATCH (n {{id: $id}})-[r:{relation}*1..{max_hops}]-(related)
54                RETURN DISTINCT related, type(r) as relation
55            """
56        else:
57            cypher = f"""
58                MATCH (n {{id: $id}})-[r*1..{max_hops}]-(related)
59                RETURN DISTINCT related, [rel in r | type(rel)] as relations
60            """
61
62        return self.query(cypher, {"id": entity_id})

In-Memory Graph with NetworkX

🐍networkx_graph.py

1import networkx as nx
2from typing import Any
3
4class InMemoryKnowledgeGraph:
5    """Simple in-memory graph for smaller use cases."""
6
7    def __init__(self):
8        self.graph = nx.MultiDiGraph()
9
10    def add_node(self, node: Node) -> None:
11        self.graph.add_node(
12            node.id,
13            type=node.type,
14            name=node.name,
15            **node.properties
16        )
17
18    def add_edge(self, edge: Edge) -> None:
19        self.graph.add_edge(
20            edge.source_id,
21            edge.target_id,
22            relation=edge.relation,
23            **edge.properties
24        )
25
26    def get_neighbors(
27        self,
28        node_id: str,
29        relation: str = None
30    ) -> list[str]:
31        """Get nodes connected to the given node."""
32        neighbors = []
33
34        for _, target, data in self.graph.out_edges(node_id, data=True):
35            if relation is None or data.get("relation") == relation:
36                neighbors.append(target)
37
38        for source, _, data in self.graph.in_edges(node_id, data=True):
39            if relation is None or data.get("relation") == relation:
40                neighbors.append(source)
41
42        return list(set(neighbors))
43
44    def find_path(
45        self,
46        source_id: str,
47        target_id: str
48    ) -> list[str]:
49        """Find shortest path between two nodes."""
50        try:
51            return nx.shortest_path(
52                self.graph,
53                source_id,
54                target_id
55            )
56        except nx.NetworkXNoPath:
57            return []
58
59    def subgraph_around(
60        self,
61        node_id: str,
62        hops: int = 2
63    ) -> "InMemoryKnowledgeGraph":
64        """Get subgraph within N hops of a node."""
65        nodes = {node_id}
66        frontier = {node_id}
67
68        for _ in range(hops):
69            new_frontier = set()
70            for n in frontier:
71                new_frontier.update(self.get_neighbors(n))
72            nodes.update(new_frontier)
73            frontier = new_frontier
74
75        subgraph = InMemoryKnowledgeGraph()
76        subgraph.graph = self.graph.subgraph(nodes).copy()
77        return subgraph
78
79    def to_text(self) -> str:
80        """Convert graph to text description."""
81        lines = []
82
83        for node_id, data in self.graph.nodes(data=True):
84            lines.append(f"- {data.get('name', node_id)} ({data.get('type', 'entity')})")
85
86        lines.append("\nRelationships:")
87
88        for source, target, data in self.graph.edges(data=True):
89            source_name = self.graph.nodes[source].get('name', source)
90            target_name = self.graph.nodes[target].get('name', target)
91            relation = data.get('relation', 'related_to')
92            lines.append(f"- {source_name} --{relation}--> {target_name}")
93
94        return "\n".join(lines)

Querying Knowledge Graphs

Converting natural language questions to graph queries:

Natural Language to Graph Query

🐍nl_to_graph.py

1class GraphQueryGenerator:
2    """Convert natural language to graph queries."""
3
4    def __init__(self, llm, graph_schema: dict):
5        self.llm = llm
6        self.schema = graph_schema
7
8    async def generate_query(self, question: str) -> str:
9        """Generate Cypher query from natural language."""
10
11        prompt = f"""Convert this question to a Cypher query for Neo4j.
12
13GRAPH SCHEMA:
14Node types: {self.schema['node_types']}
15Relationship types: {self.schema['relationship_types']}
16
17QUESTION: {question}
18
19Return only the Cypher query, no explanation.
20
21Examples:
22Q: "Who manages Alice?"
23A: MATCH (manager)-[:manages]->(p:Person {{name: 'Alice'}}) RETURN manager.name
24
25Q: "What projects depend on Service X?"
26A: MATCH (p:Project)-[:depends_on*]->(s:Service {{name: 'Service X'}}) RETURN p.name
27"""
28
29        query = await self.llm.generate(prompt)
30        return query.strip()
31
32    async def query_with_explanation(
33        self,
34        question: str,
35        graph
36    ) -> dict:
37        """Answer question with graph query and explanation."""
38
39        # Generate query
40        cypher = await self.generate_query(question)
41
42        # Execute query
43        try:
44            results = graph.query(cypher)
45        except Exception as e:
46            return {
47                "error": str(e),
48                "query": cypher,
49                "answer": None
50            }
51
52        # Generate natural language answer
53        answer = await self._generate_answer(question, results)
54
55        return {
56            "query": cypher,
57            "results": results,
58            "answer": answer
59        }
60
61    async def _generate_answer(
62        self,
63        question: str,
64        results: list[dict]
65    ) -> str:
66        prompt = f"""Answer this question based on the query results.
67
68QUESTION: {question}
69
70QUERY RESULTS:
71{json.dumps(results, indent=2)}
72
73Provide a clear, natural language answer."""
74
75        return await self.llm.generate(prompt)

Hybrid Vector + Graph Queries

🐍hybrid_query.py

1class HybridGraphVectorSearch:
2    """Combine vector search with graph traversal."""
3
4    def __init__(self, vector_store, knowledge_graph, embedder):
5        self.vectors = vector_store
6        self.graph = knowledge_graph
7        self.embedder = embedder
8
9    async def search(
10        self,
11        query: str,
12        expand_graph: bool = True,
13        max_hops: int = 2
14    ) -> list[dict]:
15        """Search vectors, then expand with graph relationships."""
16
17        # Step 1: Vector search for initial results
18        query_embedding = await self.embedder.embed(query)
19        vector_results = await self.vectors.search(
20            vector=query_embedding,
21            limit=10
22        )
23
24        if not expand_graph:
25            return vector_results
26
27        # Step 2: Extract entities from results
28        entities = []
29        for result in vector_results:
30            extracted = await self._extract_entities(result["content"])
31            entities.extend(extracted)
32
33        # Step 3: Expand through graph
34        expanded_context = []
35        seen_ids = set()
36
37        for entity_id in entities:
38            if entity_id in seen_ids:
39                continue
40
41            # Get related nodes from graph
42            related = self.graph.find_related(
43                entity_id=entity_id,
44                max_hops=max_hops
45            )
46
47            for rel in related:
48                if rel["id"] not in seen_ids:
49                    expanded_context.append(rel)
50                    seen_ids.add(rel["id"])
51
52        # Step 4: Combine and rank results
53        combined = self._combine_results(vector_results, expanded_context)
54
55        return combined
56
57    def _combine_results(
58        self,
59        vector_results: list[dict],
60        graph_results: list[dict]
61    ) -> list[dict]:
62        """Combine vector and graph results."""
63        combined = []
64
65        # Add vector results with source tag
66        for r in vector_results:
67            r["source"] = "vector"
68            combined.append(r)
69
70        # Add graph results
71        for r in graph_results:
72            r["source"] = "graph"
73            combined.append(r)
74
75        # Could add more sophisticated ranking here
76        return combined

Graph-Enhanced RAG

Using knowledge graphs to improve RAG retrieval and generation:

🐍graph_rag.py

1class GraphRAG:
2    """RAG enhanced with knowledge graph context."""
3
4    def __init__(
5        self,
6        vector_store,
7        knowledge_graph,
8        llm,
9        embedder
10    ):
11        self.vectors = vector_store
12        self.graph = knowledge_graph
13        self.llm = llm
14        self.embedder = embedder
15
16    async def query(self, question: str) -> str:
17        # Step 1: Extract entities from question
18        entities = await self._extract_question_entities(question)
19
20        # Step 2: Get graph context for entities
21        graph_context = await self._get_graph_context(entities)
22
23        # Step 3: Vector search for relevant chunks
24        chunks = await self._vector_search(question)
25
26        # Step 4: Build prompt with both contexts
27        prompt = self._build_prompt(question, chunks, graph_context)
28
29        # Step 5: Generate answer
30        return await self.llm.generate(prompt)
31
32    async def _extract_question_entities(
33        self,
34        question: str
35    ) -> list[str]:
36        prompt = f"""Extract entity names from this question.
37Return as JSON array of strings.
38
39Question: {question}"""
40
41        response = await self.llm.generate(prompt)
42        return json.loads(response)
43
44    async def _get_graph_context(
45        self,
46        entities: list[str]
47    ) -> str:
48        """Get relevant graph information for entities."""
49        context_parts = []
50
51        for entity_name in entities:
52            # Find entity in graph
53            node = self.graph.find_by_name(entity_name)
54            if not node:
55                continue
56
57            # Get subgraph around entity
58            subgraph = self.graph.subgraph_around(
59                node_id=node.id,
60                hops=2
61            )
62
63            context_parts.append(f"About {entity_name}:")
64            context_parts.append(subgraph.to_text())
65
66        return "\n\n".join(context_parts)
67
68    async def _vector_search(
69        self,
70        question: str,
71        limit: int = 5
72    ) -> list[dict]:
73        embedding = await self.embedder.embed(question)
74        return await self.vectors.search(
75            vector=embedding,
76            limit=limit
77        )
78
79    def _build_prompt(
80        self,
81        question: str,
82        chunks: list[dict],
83        graph_context: str
84    ) -> str:
85        chunks_text = "\n\n".join([c["content"] for c in chunks])
86
87        return f"""Answer the question using both the document excerpts
88and the knowledge graph context.
89
90KNOWLEDGE GRAPH CONTEXT:
91{graph_context}
92
93DOCUMENT EXCERPTS:
94{chunks_text}
95
96QUESTION: {question}
97
98Provide a comprehensive answer that integrates information
99from both sources. Cite when using specific facts."""

When to Use Graphs

Use knowledge graphs when your data has clear entity relationships that users query about. If your queries are purely semantic similarity ("documents like this"), vectors alone may suffice.

Summary

Key concepts for knowledge graphs in agent memory:

Structured relationships: Graphs capture explicit connections between entities
Multi-hop reasoning: Traverse relationships to answer complex queries
Extraction from text: Use LLMs to extract entities and relationships
Query generation: Convert natural language to graph queries (Cypher)
Hybrid approach: Combine vector similarity with graph traversal
Graph-enhanced RAG: Use graph context to improve retrieval and generation

Next: We'll put everything together—building a complete memory system for agents.