Introduction
Vector databases excel at finding semantically similar content, but they struggle with structured relationships. "Who reports to Alice?" or "What projects depend on Service X?" require traversing explicit relationships that vector similarity doesn't capture. Knowledge graphs store information as entities and their relationships, enabling precise structured queries alongside semantic search.
Graphs vs Vectors: Vector search finds "content like this"; graph queries find "entities connected like that". Many applications need both.
Knowledge Graph Fundamentals
Knowledge graphs represent information as nodes (entities) and edges (relationships):
πgraph_structure.txt
1KNOWLEDGE GRAPH STRUCTURE
2
3 βββββββββββββββββββββββββββββββββββββββββββββββββββ
4 β NODES (Entities) β
5 β β
6 β [Alice]ββββββworks_atβββββββΆ[Acme Corp] β
7 β β β β
8 β β β β
9 β manages located_in β
10 β β β β
11 β βΌ βΌ β
12 β [Bob] [New York] β
13 β β β
14 β works_on β
15 β β β
16 β βΌ β
17 β [Project X]βββdepends_onββββΆ[Service Y] β
18 β β
19 βββββββββββββββββββββββββββββββββββββββββββββββββββ
20
21TRIPLE FORMAT: (subject, predicate, object)
22- (Alice, works_at, Acme Corp)
23- (Alice, manages, Bob)
24- (Bob, works_on, Project X)
25- (Project X, depends_on, Service Y)
26- (Acme Corp, located_in, New York)
27
28QUERIES THIS ENABLES:
29- "Who manages Bob?" β Alice
30- "What projects depend on Service Y?" β Project X
31- "Who works at companies in New York?" β Alice, BobCore Concepts
| Concept | Description | Example |
|---|---|---|
| Node/Entity | A thing or concept | Person, Company, Project |
| Edge/Relationship | Connection between nodes | works_at, manages, depends_on |
| Property | Attribute of a node or edge | name, date, importance |
| Triple | (subject, predicate, object) | (Alice, manages, Bob) |
| Path | Sequence of connected edges | Alice β manages β Bob β works_on β Project X |
Why Graphs for Agent Memory
πgraph_advantages.py
1# What graphs enable that vectors can't easily do:
2
3# 1. Multi-hop reasoning
4query = "Who manages people working on Project X?"
5# Graph: Follow works_on edges back, then manages edges
6# Vector: Would need to know exact phrasing
7
8# 2. Relationship-based retrieval
9query = "Get all dependencies of dependencies of Service A"
10# Graph: Traverse depends_on edges recursively
11# Vector: Semantic similarity doesn't capture transitivity
12
13# 3. Constrained search
14query = "Find senior engineers who worked on AI projects in 2023"
15# Graph: Filter by role, project type, and date
16# Vector: Hard to combine multiple exact constraints
17
18# 4. Context-aware answers
19question = "Should we deprecate Service Y?"
20# Graph: Find all dependents, their importance, who owns them
21# Provides structured context for informed answerBuilding Knowledge Graphs
Constructing a knowledge graph from conversations and documents:
Entity and Relation Extraction
πgraph_extraction.py
1from dataclasses import dataclass
2from typing import Optional
3import json
4
5@dataclass
6class Node:
7 id: str
8 type: str
9 name: str
10 properties: dict
11
12@dataclass
13class Edge:
14 source_id: str
15 target_id: str
16 relation: str
17 properties: dict
18
19class KnowledgeGraphBuilder:
20 """Extract and build knowledge graphs from text."""
21
22 def __init__(self, llm):
23 self.llm = llm
24 self.nodes: dict[str, Node] = {}
25 self.edges: list[Edge] = []
26
27 async def extract_from_text(self, text: str) -> tuple[list[Node], list[Edge]]:
28 """Extract entities and relationships from text."""
29
30 prompt = f"""Extract entities and relationships from this text.
31
32TEXT:
33{text}
34
35Return JSON with this structure:
36{{
37 "entities": [
38 {{"id": "unique_id", "type": "person|org|project|concept|location",
39 "name": "display name", "properties": {{}}}}
40 ],
41 "relationships": [
42 {{"source": "entity_id", "target": "entity_id",
43 "relation": "relationship_type", "properties": {{}}}}
44 ]
45}}
46
47Common relationship types:
48- works_at, works_on, manages, reports_to
49- depends_on, uses, created_by, owns
50- located_in, part_of, related_to
51
52Only extract clear, factual relationships."""
53
54 response = await self.llm.generate(prompt)
55 data = json.loads(response)
56
57 nodes = [
58 Node(
59 id=e["id"],
60 type=e["type"],
61 name=e["name"],
62 properties=e.get("properties", {})
63 )
64 for e in data["entities"]
65 ]
66
67 edges = [
68 Edge(
69 source_id=r["source"],
70 target_id=r["target"],
71 relation=r["relation"],
72 properties=r.get("properties", {})
73 )
74 for r in data["relationships"]
75 ]
76
77 return nodes, edges
78
79 def add_to_graph(self, nodes: list[Node], edges: list[Edge]) -> None:
80 """Add extracted elements to the graph."""
81 for node in nodes:
82 # Merge if node already exists
83 if node.id in self.nodes:
84 existing = self.nodes[node.id]
85 existing.properties.update(node.properties)
86 else:
87 self.nodes[node.id] = node
88
89 self.edges.extend(edges)
90
91 async def process_conversation(
92 self,
93 messages: list[dict]
94 ) -> None:
95 """Extract knowledge from a conversation."""
96 # Combine messages into text
97 text = "\n".join([
98 f"{m['role']}: {m['content']}"
99 for m in messages
100 ])
101
102 nodes, edges = await self.extract_from_text(text)
103 self.add_to_graph(nodes, edges)Graph Storage with Neo4j
πneo4j_storage.py
1from neo4j import GraphDatabase
2
3class Neo4jKnowledgeGraph:
4 """Store knowledge graph in Neo4j."""
5
6 def __init__(self, uri: str, user: str, password: str):
7 self.driver = GraphDatabase.driver(uri, auth=(user, password))
8
9 def close(self):
10 self.driver.close()
11
12 def add_node(self, node: Node) -> None:
13 with self.driver.session() as session:
14 session.run(
15 f"""
16 MERGE (n:{node.type} {{id: $id}})
17 SET n.name = $name
18 SET n += $properties
19 """,
20 id=node.id,
21 name=node.name,
22 properties=node.properties
23 )
24
25 def add_edge(self, edge: Edge) -> None:
26 with self.driver.session() as session:
27 session.run(
28 f"""
29 MATCH (a {{id: $source_id}})
30 MATCH (b {{id: $target_id}})
31 MERGE (a)-[r:{edge.relation}]->(b)
32 SET r += $properties
33 """,
34 source_id=edge.source_id,
35 target_id=edge.target_id,
36 properties=edge.properties
37 )
38
39 def query(self, cypher: str, params: dict = None) -> list[dict]:
40 with self.driver.session() as session:
41 result = session.run(cypher, params or {})
42 return [record.data() for record in result]
43
44 def find_related(
45 self,
46 entity_id: str,
47 relation: str = None,
48 max_hops: int = 2
49 ) -> list[dict]:
50 """Find entities related to the given entity."""
51 if relation:
52 cypher = f"""
53 MATCH (n {{id: $id}})-[r:{relation}*1..{max_hops}]-(related)
54 RETURN DISTINCT related, type(r) as relation
55 """
56 else:
57 cypher = f"""
58 MATCH (n {{id: $id}})-[r*1..{max_hops}]-(related)
59 RETURN DISTINCT related, [rel in r | type(rel)] as relations
60 """
61
62 return self.query(cypher, {"id": entity_id})In-Memory Graph with NetworkX
πnetworkx_graph.py
1import networkx as nx
2from typing import Any
3
4class InMemoryKnowledgeGraph:
5 """Simple in-memory graph for smaller use cases."""
6
7 def __init__(self):
8 self.graph = nx.MultiDiGraph()
9
10 def add_node(self, node: Node) -> None:
11 self.graph.add_node(
12 node.id,
13 type=node.type,
14 name=node.name,
15 **node.properties
16 )
17
18 def add_edge(self, edge: Edge) -> None:
19 self.graph.add_edge(
20 edge.source_id,
21 edge.target_id,
22 relation=edge.relation,
23 **edge.properties
24 )
25
26 def get_neighbors(
27 self,
28 node_id: str,
29 relation: str = None
30 ) -> list[str]:
31 """Get nodes connected to the given node."""
32 neighbors = []
33
34 for _, target, data in self.graph.out_edges(node_id, data=True):
35 if relation is None or data.get("relation") == relation:
36 neighbors.append(target)
37
38 for source, _, data in self.graph.in_edges(node_id, data=True):
39 if relation is None or data.get("relation") == relation:
40 neighbors.append(source)
41
42 return list(set(neighbors))
43
44 def find_path(
45 self,
46 source_id: str,
47 target_id: str
48 ) -> list[str]:
49 """Find shortest path between two nodes."""
50 try:
51 return nx.shortest_path(
52 self.graph,
53 source_id,
54 target_id
55 )
56 except nx.NetworkXNoPath:
57 return []
58
59 def subgraph_around(
60 self,
61 node_id: str,
62 hops: int = 2
63 ) -> "InMemoryKnowledgeGraph":
64 """Get subgraph within N hops of a node."""
65 nodes = {node_id}
66 frontier = {node_id}
67
68 for _ in range(hops):
69 new_frontier = set()
70 for n in frontier:
71 new_frontier.update(self.get_neighbors(n))
72 nodes.update(new_frontier)
73 frontier = new_frontier
74
75 subgraph = InMemoryKnowledgeGraph()
76 subgraph.graph = self.graph.subgraph(nodes).copy()
77 return subgraph
78
79 def to_text(self) -> str:
80 """Convert graph to text description."""
81 lines = []
82
83 for node_id, data in self.graph.nodes(data=True):
84 lines.append(f"- {data.get('name', node_id)} ({data.get('type', 'entity')})")
85
86 lines.append("\nRelationships:")
87
88 for source, target, data in self.graph.edges(data=True):
89 source_name = self.graph.nodes[source].get('name', source)
90 target_name = self.graph.nodes[target].get('name', target)
91 relation = data.get('relation', 'related_to')
92 lines.append(f"- {source_name} --{relation}--> {target_name}")
93
94 return "\n".join(lines)Querying Knowledge Graphs
Converting natural language questions to graph queries:
Natural Language to Graph Query
πnl_to_graph.py
1class GraphQueryGenerator:
2 """Convert natural language to graph queries."""
3
4 def __init__(self, llm, graph_schema: dict):
5 self.llm = llm
6 self.schema = graph_schema
7
8 async def generate_query(self, question: str) -> str:
9 """Generate Cypher query from natural language."""
10
11 prompt = f"""Convert this question to a Cypher query for Neo4j.
12
13GRAPH SCHEMA:
14Node types: {self.schema['node_types']}
15Relationship types: {self.schema['relationship_types']}
16
17QUESTION: {question}
18
19Return only the Cypher query, no explanation.
20
21Examples:
22Q: "Who manages Alice?"
23A: MATCH (manager)-[:manages]->(p:Person {{name: 'Alice'}}) RETURN manager.name
24
25Q: "What projects depend on Service X?"
26A: MATCH (p:Project)-[:depends_on*]->(s:Service {{name: 'Service X'}}) RETURN p.name
27"""
28
29 query = await self.llm.generate(prompt)
30 return query.strip()
31
32 async def query_with_explanation(
33 self,
34 question: str,
35 graph
36 ) -> dict:
37 """Answer question with graph query and explanation."""
38
39 # Generate query
40 cypher = await self.generate_query(question)
41
42 # Execute query
43 try:
44 results = graph.query(cypher)
45 except Exception as e:
46 return {
47 "error": str(e),
48 "query": cypher,
49 "answer": None
50 }
51
52 # Generate natural language answer
53 answer = await self._generate_answer(question, results)
54
55 return {
56 "query": cypher,
57 "results": results,
58 "answer": answer
59 }
60
61 async def _generate_answer(
62 self,
63 question: str,
64 results: list[dict]
65 ) -> str:
66 prompt = f"""Answer this question based on the query results.
67
68QUESTION: {question}
69
70QUERY RESULTS:
71{json.dumps(results, indent=2)}
72
73Provide a clear, natural language answer."""
74
75 return await self.llm.generate(prompt)Hybrid Vector + Graph Queries
πhybrid_query.py
1class HybridGraphVectorSearch:
2 """Combine vector search with graph traversal."""
3
4 def __init__(self, vector_store, knowledge_graph, embedder):
5 self.vectors = vector_store
6 self.graph = knowledge_graph
7 self.embedder = embedder
8
9 async def search(
10 self,
11 query: str,
12 expand_graph: bool = True,
13 max_hops: int = 2
14 ) -> list[dict]:
15 """Search vectors, then expand with graph relationships."""
16
17 # Step 1: Vector search for initial results
18 query_embedding = await self.embedder.embed(query)
19 vector_results = await self.vectors.search(
20 vector=query_embedding,
21 limit=10
22 )
23
24 if not expand_graph:
25 return vector_results
26
27 # Step 2: Extract entities from results
28 entities = []
29 for result in vector_results:
30 extracted = await self._extract_entities(result["content"])
31 entities.extend(extracted)
32
33 # Step 3: Expand through graph
34 expanded_context = []
35 seen_ids = set()
36
37 for entity_id in entities:
38 if entity_id in seen_ids:
39 continue
40
41 # Get related nodes from graph
42 related = self.graph.find_related(
43 entity_id=entity_id,
44 max_hops=max_hops
45 )
46
47 for rel in related:
48 if rel["id"] not in seen_ids:
49 expanded_context.append(rel)
50 seen_ids.add(rel["id"])
51
52 # Step 4: Combine and rank results
53 combined = self._combine_results(vector_results, expanded_context)
54
55 return combined
56
57 def _combine_results(
58 self,
59 vector_results: list[dict],
60 graph_results: list[dict]
61 ) -> list[dict]:
62 """Combine vector and graph results."""
63 combined = []
64
65 # Add vector results with source tag
66 for r in vector_results:
67 r["source"] = "vector"
68 combined.append(r)
69
70 # Add graph results
71 for r in graph_results:
72 r["source"] = "graph"
73 combined.append(r)
74
75 # Could add more sophisticated ranking here
76 return combinedGraph-Enhanced RAG
Using knowledge graphs to improve RAG retrieval and generation:
πgraph_rag.py
1class GraphRAG:
2 """RAG enhanced with knowledge graph context."""
3
4 def __init__(
5 self,
6 vector_store,
7 knowledge_graph,
8 llm,
9 embedder
10 ):
11 self.vectors = vector_store
12 self.graph = knowledge_graph
13 self.llm = llm
14 self.embedder = embedder
15
16 async def query(self, question: str) -> str:
17 # Step 1: Extract entities from question
18 entities = await self._extract_question_entities(question)
19
20 # Step 2: Get graph context for entities
21 graph_context = await self._get_graph_context(entities)
22
23 # Step 3: Vector search for relevant chunks
24 chunks = await self._vector_search(question)
25
26 # Step 4: Build prompt with both contexts
27 prompt = self._build_prompt(question, chunks, graph_context)
28
29 # Step 5: Generate answer
30 return await self.llm.generate(prompt)
31
32 async def _extract_question_entities(
33 self,
34 question: str
35 ) -> list[str]:
36 prompt = f"""Extract entity names from this question.
37Return as JSON array of strings.
38
39Question: {question}"""
40
41 response = await self.llm.generate(prompt)
42 return json.loads(response)
43
44 async def _get_graph_context(
45 self,
46 entities: list[str]
47 ) -> str:
48 """Get relevant graph information for entities."""
49 context_parts = []
50
51 for entity_name in entities:
52 # Find entity in graph
53 node = self.graph.find_by_name(entity_name)
54 if not node:
55 continue
56
57 # Get subgraph around entity
58 subgraph = self.graph.subgraph_around(
59 node_id=node.id,
60 hops=2
61 )
62
63 context_parts.append(f"About {entity_name}:")
64 context_parts.append(subgraph.to_text())
65
66 return "\n\n".join(context_parts)
67
68 async def _vector_search(
69 self,
70 question: str,
71 limit: int = 5
72 ) -> list[dict]:
73 embedding = await self.embedder.embed(question)
74 return await self.vectors.search(
75 vector=embedding,
76 limit=limit
77 )
78
79 def _build_prompt(
80 self,
81 question: str,
82 chunks: list[dict],
83 graph_context: str
84 ) -> str:
85 chunks_text = "\n\n".join([c["content"] for c in chunks])
86
87 return f"""Answer the question using both the document excerpts
88and the knowledge graph context.
89
90KNOWLEDGE GRAPH CONTEXT:
91{graph_context}
92
93DOCUMENT EXCERPTS:
94{chunks_text}
95
96QUESTION: {question}
97
98Provide a comprehensive answer that integrates information
99from both sources. Cite when using specific facts."""When to Use Graphs
Use knowledge graphs when your data has clear entity relationships that users query about. If your queries are purely semantic similarity ("documents like this"), vectors alone may suffice.
Summary
Key concepts for knowledge graphs in agent memory:
- Structured relationships: Graphs capture explicit connections between entities
- Multi-hop reasoning: Traverse relationships to answer complex queries
- Extraction from text: Use LLMs to extract entities and relationships
- Query generation: Convert natural language to graph queries (Cypher)
- Hybrid approach: Combine vector similarity with graph traversal
- Graph-enhanced RAG: Use graph context to improve retrieval and generation
Next: We'll put everything togetherβbuilding a complete memory system for agents.