Introduction
Vector databases are the backbone of modern AI memory systems. They enable semantic search—finding memories based on meaning rather than exact keywords. When you ask an agent "What did we discuss about the API?", it can find relevant memories even if they never used the word "API" but talked about "endpoints" or "REST interfaces".
Why Vectors: Traditional databases search by exact match. Vector databases search by similarity in meaning. This is what makes intelligent memory retrieval possible.
Embeddings Explained
Before diving into vector databases, we need to understand embeddings—the numeric representations that make semantic search work.
What Are Embeddings?
📝embeddings_explained.txt
1EMBEDDING VISUALIZATION
2
3Text: "The cat sat on the mat"
4 ↓
5 Embedding Model
6 ↓
7Vector: [0.12, -0.34, 0.56, 0.78, -0.23, ...]
8 (typically 384-3072 dimensions)
9
10KEY INSIGHT:
11Similar meanings → Similar vectors
12
13"The cat sat on the mat" → [0.12, -0.34, 0.56, ...]
14"A feline rested on the rug" → [0.11, -0.32, 0.54, ...] ← Very similar!
15"Quantum physics is complex" → [0.89, 0.12, -0.67, ...] ← Very different
16
17SIMILARITY MEASURE (Cosine Similarity):
18cat/mat vs feline/rug: 0.95 (very similar)
19cat/mat vs quantum: 0.12 (not similar)Generating Embeddings
🐍embeddings.py
1import anthropic
2import openai
3from sentence_transformers import SentenceTransformer
4
5# Option 1: OpenAI Embeddings
6async def embed_with_openai(texts: list[str]) -> list[list[float]]:
7 """Generate embeddings using OpenAI."""
8 client = openai.OpenAI()
9
10 response = client.embeddings.create(
11 model="text-embedding-3-small", # or "text-embedding-3-large"
12 input=texts
13 )
14
15 return [item.embedding for item in response.data]
16
17
18# Option 2: Anthropic Voyager (via API)
19async def embed_with_voyager(texts: list[str]) -> list[list[float]]:
20 """Generate embeddings using Anthropic Voyager."""
21 client = anthropic.Anthropic()
22
23 # Note: Check current Anthropic API for embedding support
24 embeddings = []
25 for text in texts:
26 # Voyager model for embeddings
27 response = await client.embeddings.create(
28 model="voyage-2",
29 input=text
30 )
31 embeddings.append(response.embedding)
32
33 return embeddings
34
35
36# Option 3: Local embeddings with sentence-transformers
37def embed_locally(texts: list[str]) -> list[list[float]]:
38 """Generate embeddings locally (no API calls)."""
39 model = SentenceTransformer("all-MiniLM-L6-v2") # Fast, 384 dims
40 # Or: "all-mpnet-base-v2" for better quality, 768 dims
41
42 embeddings = model.encode(texts)
43 return embeddings.tolist()
44
45
46# Embedding model comparison
47EMBEDDING_MODELS = {
48 "text-embedding-3-small": {
49 "provider": "OpenAI",
50 "dimensions": 1536,
51 "cost": "$0.02/1M tokens",
52 "quality": "Good"
53 },
54 "text-embedding-3-large": {
55 "provider": "OpenAI",
56 "dimensions": 3072,
57 "cost": "$0.13/1M tokens",
58 "quality": "Excellent"
59 },
60 "voyage-2": {
61 "provider": "Anthropic/Voyage",
62 "dimensions": 1024,
63 "cost": "$0.10/1M tokens",
64 "quality": "Excellent"
65 },
66 "all-MiniLM-L6-v2": {
67 "provider": "Local",
68 "dimensions": 384,
69 "cost": "Free",
70 "quality": "Good for general use"
71 },
72 "all-mpnet-base-v2": {
73 "provider": "Local",
74 "dimensions": 768,
75 "cost": "Free",
76 "quality": "Better quality"
77 }
78}| Model | Provider | Dimensions | Best For |
|---|---|---|---|
| text-embedding-3-small | OpenAI | 1536 | General purpose, cost-effective |
| text-embedding-3-large | OpenAI | 3072 | Highest quality, complex retrieval |
| voyage-2 | Voyage AI | 1024 | Code and technical content |
| all-MiniLM-L6-v2 | Local | 384 | Fast, privacy-sensitive |
| all-mpnet-base-v2 | Local | 768 | Balance of speed and quality |
Vector Database Options
Several vector databases are available, each with different tradeoffs:
Comparison Overview
| Database | Type | Best For | Key Features |
|---|---|---|---|
| Pinecone | Cloud | Production, scale | Managed, fast, metadata filtering |
| Weaviate | Self-hosted/Cloud | Complex queries | GraphQL, hybrid search |
| Chroma | Embedded/Server | Development, simplicity | Easy setup, Python-native |
| Qdrant | Self-hosted/Cloud | Performance | Rust-based, filtering |
| pgvector | PostgreSQL extension | Existing Postgres users | Familiar SQL, ACID |
| FAISS | Library | Maximum speed | Facebook, in-memory |
Chroma (Simplest to Start)
🐍chroma_example.py
1import chromadb
2from chromadb.utils import embedding_functions
3
4# Initialize Chroma
5client = chromadb.Client() # In-memory
6# Or: chromadb.PersistentClient(path="/path/to/db") for persistence
7
8# Set up embedding function
9openai_ef = embedding_functions.OpenAIEmbeddingFunction(
10 api_key="your-api-key",
11 model_name="text-embedding-3-small"
12)
13
14# Create a collection
15collection = client.create_collection(
16 name="agent_memories",
17 embedding_function=openai_ef,
18 metadata={"hnsw:space": "cosine"} # Use cosine similarity
19)
20
21# Add memories
22collection.add(
23 ids=["mem1", "mem2", "mem3"],
24 documents=[
25 "User prefers concise responses",
26 "User is working on a Python project",
27 "User asked about API rate limiting"
28 ],
29 metadatas=[
30 {"type": "preference", "user_id": "alice"},
31 {"type": "context", "user_id": "alice"},
32 {"type": "question", "user_id": "alice"}
33 ]
34)
35
36# Query memories
37results = collection.query(
38 query_texts=["How should I format responses?"],
39 n_results=3,
40 where={"user_id": "alice"} # Metadata filtering
41)
42
43print(results["documents"])
44# [['User prefers concise responses', ...]]Pinecone (Production Scale)
🐍pinecone_example.py
1from pinecone import Pinecone, ServerlessSpec
2import openai
3
4# Initialize Pinecone
5pc = Pinecone(api_key="your-api-key")
6
7# Create index (one-time)
8pc.create_index(
9 name="agent-memories",
10 dimension=1536, # Match your embedding model
11 metric="cosine",
12 spec=ServerlessSpec(
13 cloud="aws",
14 region="us-east-1"
15 )
16)
17
18# Connect to index
19index = pc.Index("agent-memories")
20
21# Generate embeddings
22def get_embedding(text: str) -> list[float]:
23 client = openai.OpenAI()
24 response = client.embeddings.create(
25 model="text-embedding-3-small",
26 input=text
27 )
28 return response.data[0].embedding
29
30# Upsert memories
31memories = [
32 {"id": "mem1", "text": "User prefers dark mode", "type": "preference"},
33 {"id": "mem2", "text": "Working on ML project", "type": "context"},
34]
35
36vectors = []
37for mem in memories:
38 embedding = get_embedding(mem["text"])
39 vectors.append({
40 "id": mem["id"],
41 "values": embedding,
42 "metadata": {
43 "text": mem["text"],
44 "type": mem["type"],
45 "user_id": "alice"
46 }
47 })
48
49index.upsert(vectors=vectors)
50
51# Query
52query_embedding = get_embedding("What are the user's preferences?")
53
54results = index.query(
55 vector=query_embedding,
56 top_k=5,
57 include_metadata=True,
58 filter={"user_id": {"$eq": "alice"}}
59)
60
61for match in results["matches"]:
62 print(f"Score: {match['score']:.3f} - {match['metadata']['text']}")pgvector (PostgreSQL)
🐍pgvector_example.py
1import asyncpg
2import numpy as np
3
4# SQL to set up pgvector
5SETUP_SQL = """
6CREATE EXTENSION IF NOT EXISTS vector;
7
8CREATE TABLE IF NOT EXISTS memories (
9 id SERIAL PRIMARY KEY,
10 user_id TEXT NOT NULL,
11 content TEXT NOT NULL,
12 memory_type TEXT NOT NULL,
13 embedding vector(1536),
14 created_at TIMESTAMP DEFAULT NOW(),
15 importance FLOAT DEFAULT 0.5
16);
17
18CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops)
19WITH (lists = 100);
20"""
21
22class PgVectorMemory:
23 def __init__(self, connection_string: str):
24 self.conn_string = connection_string
25 self.pool = None
26
27 async def connect(self):
28 self.pool = await asyncpg.create_pool(self.conn_string)
29
30 async def store(
31 self,
32 user_id: str,
33 content: str,
34 memory_type: str,
35 embedding: list[float],
36 importance: float = 0.5
37 ) -> int:
38 async with self.pool.acquire() as conn:
39 result = await conn.fetchrow(
40 """
41 INSERT INTO memories (user_id, content, memory_type, embedding, importance)
42 VALUES ($1, $2, $3, $4, $5)
43 RETURNING id
44 """,
45 user_id, content, memory_type, embedding, importance
46 )
47 return result["id"]
48
49 async def search(
50 self,
51 user_id: str,
52 query_embedding: list[float],
53 limit: int = 10,
54 memory_type: str = None
55 ) -> list[dict]:
56 async with self.pool.acquire() as conn:
57 # Build query with optional type filter
58 if memory_type:
59 rows = await conn.fetch(
60 """
61 SELECT id, content, memory_type, importance,
62 1 - (embedding <=> $1) as similarity
63 FROM memories
64 WHERE user_id = $2 AND memory_type = $3
65 ORDER BY embedding <=> $1
66 LIMIT $4
67 """,
68 query_embedding, user_id, memory_type, limit
69 )
70 else:
71 rows = await conn.fetch(
72 """
73 SELECT id, content, memory_type, importance,
74 1 - (embedding <=> $1) as similarity
75 FROM memories
76 WHERE user_id = $2
77 ORDER BY embedding <=> $1
78 LIMIT $3
79 """,
80 query_embedding, user_id, limit
81 )
82
83 return [dict(row) for row in rows]Implementation Examples
Here's a complete memory store implementation using vector databases:
🐍vector_memory_store.py
1from abc import ABC, abstractmethod
2from dataclasses import dataclass
3from datetime import datetime
4from typing import Optional
5import hashlib
6
7@dataclass
8class Memory:
9 id: str
10 user_id: str
11 content: str
12 memory_type: str
13 embedding: list[float]
14 importance: float
15 created_at: datetime
16 metadata: dict
17
18class VectorMemoryStore(ABC):
19 """Abstract base class for vector memory stores."""
20
21 @abstractmethod
22 async def store(self, memory: Memory) -> str:
23 pass
24
25 @abstractmethod
26 async def search(
27 self,
28 user_id: str,
29 query_embedding: list[float],
30 limit: int = 10,
31 filters: dict = None
32 ) -> list[Memory]:
33 pass
34
35 @abstractmethod
36 async def delete(self, memory_id: str) -> bool:
37 pass
38
39 @abstractmethod
40 async def delete_user_memories(self, user_id: str) -> int:
41 pass
42
43
44class ChromaMemoryStore(VectorMemoryStore):
45 """Chroma-based implementation."""
46
47 def __init__(self, collection_name: str = "memories"):
48 import chromadb
49 self.client = chromadb.PersistentClient(path="./chroma_db")
50 self.collection = self.client.get_or_create_collection(
51 name=collection_name,
52 metadata={"hnsw:space": "cosine"}
53 )
54
55 async def store(self, memory: Memory) -> str:
56 self.collection.add(
57 ids=[memory.id],
58 embeddings=[memory.embedding],
59 documents=[memory.content],
60 metadatas=[{
61 "user_id": memory.user_id,
62 "memory_type": memory.memory_type,
63 "importance": memory.importance,
64 "created_at": memory.created_at.isoformat(),
65 **memory.metadata
66 }]
67 )
68 return memory.id
69
70 async def search(
71 self,
72 user_id: str,
73 query_embedding: list[float],
74 limit: int = 10,
75 filters: dict = None
76 ) -> list[Memory]:
77 where_clause = {"user_id": user_id}
78 if filters:
79 where_clause.update(filters)
80
81 results = self.collection.query(
82 query_embeddings=[query_embedding],
83 n_results=limit,
84 where=where_clause,
85 include=["documents", "metadatas", "embeddings", "distances"]
86 )
87
88 memories = []
89 for i in range(len(results["ids"][0])):
90 meta = results["metadatas"][0][i]
91 memories.append(Memory(
92 id=results["ids"][0][i],
93 user_id=meta["user_id"],
94 content=results["documents"][0][i],
95 memory_type=meta["memory_type"],
96 embedding=results["embeddings"][0][i] if results["embeddings"] else [],
97 importance=meta["importance"],
98 created_at=datetime.fromisoformat(meta["created_at"]),
99 metadata={k: v for k, v in meta.items()
100 if k not in ["user_id", "memory_type", "importance", "created_at"]}
101 ))
102
103 return memories
104
105 async def delete(self, memory_id: str) -> bool:
106 try:
107 self.collection.delete(ids=[memory_id])
108 return True
109 except Exception:
110 return False
111
112 async def delete_user_memories(self, user_id: str) -> int:
113 # Get all memory IDs for user
114 results = self.collection.get(
115 where={"user_id": user_id},
116 include=[]
117 )
118 if results["ids"]:
119 self.collection.delete(ids=results["ids"])
120 return len(results["ids"])
121 return 0
122
123
124class MemoryManager:
125 """High-level memory management with embedding generation."""
126
127 def __init__(
128 self,
129 store: VectorMemoryStore,
130 embedding_model: str = "text-embedding-3-small"
131 ):
132 self.store = store
133 self.embedding_model = embedding_model
134
135 async def embed(self, text: str) -> list[float]:
136 import openai
137 client = openai.OpenAI()
138 response = client.embeddings.create(
139 model=self.embedding_model,
140 input=text
141 )
142 return response.data[0].embedding
143
144 def _generate_id(self, content: str, user_id: str) -> str:
145 hash_input = f"{user_id}:{content}:{datetime.now().isoformat()}"
146 return hashlib.sha256(hash_input.encode()).hexdigest()[:16]
147
148 async def remember(
149 self,
150 user_id: str,
151 content: str,
152 memory_type: str = "general",
153 importance: float = 0.5,
154 metadata: dict = None
155 ) -> str:
156 embedding = await self.embed(content)
157
158 memory = Memory(
159 id=self._generate_id(content, user_id),
160 user_id=user_id,
161 content=content,
162 memory_type=memory_type,
163 embedding=embedding,
164 importance=importance,
165 created_at=datetime.now(),
166 metadata=metadata or {}
167 )
168
169 return await self.store.store(memory)
170
171 async def recall(
172 self,
173 user_id: str,
174 query: str,
175 limit: int = 10,
176 memory_types: list[str] = None,
177 min_importance: float = 0.0
178 ) -> list[Memory]:
179 query_embedding = await self.embed(query)
180
181 filters = {}
182 if memory_types:
183 filters["memory_type"] = {"$in": memory_types}
184 if min_importance > 0:
185 filters["importance"] = {"$gte": min_importance}
186
187 return await self.store.search(
188 user_id=user_id,
189 query_embedding=query_embedding,
190 limit=limit,
191 filters=filters if filters else None
192 )
193
194 async def forget(self, memory_id: str) -> bool:
195 return await self.store.delete(memory_id)
196
197 async def forget_user(self, user_id: str) -> int:
198 return await self.store.delete_user_memories(user_id)Optimization Techniques
Optimizing vector search for production:
Batch Embedding
🐍batch_embedding.py
1async def batch_embed(
2 texts: list[str],
3 batch_size: int = 100
4) -> list[list[float]]:
5 """Embed texts in batches for efficiency."""
6 import openai
7 client = openai.OpenAI()
8
9 all_embeddings = []
10
11 for i in range(0, len(texts), batch_size):
12 batch = texts[i:i + batch_size]
13 response = client.embeddings.create(
14 model="text-embedding-3-small",
15 input=batch
16 )
17 all_embeddings.extend([item.embedding for item in response.data])
18
19 return all_embeddingsCaching Embeddings
🐍embedding_cache.py
1import hashlib
2from functools import lru_cache
3
4class EmbeddingCache:
5 """Cache embeddings to avoid recomputation."""
6
7 def __init__(self, max_size: int = 10000):
8 self.cache = {}
9 self.max_size = max_size
10
11 def _hash_text(self, text: str) -> str:
12 return hashlib.sha256(text.encode()).hexdigest()
13
14 async def get_embedding(
15 self,
16 text: str,
17 embed_fn
18 ) -> list[float]:
19 key = self._hash_text(text)
20
21 if key in self.cache:
22 return self.cache[key]
23
24 embedding = await embed_fn(text)
25
26 # LRU eviction if cache is full
27 if len(self.cache) >= self.max_size:
28 oldest_key = next(iter(self.cache))
29 del self.cache[oldest_key]
30
31 self.cache[key] = embedding
32 return embeddingHybrid Search
🐍hybrid_search.py
1class HybridSearch:
2 """Combine vector search with keyword search."""
3
4 async def search(
5 self,
6 query: str,
7 user_id: str,
8 limit: int = 10
9 ) -> list[Memory]:
10 # Vector (semantic) search
11 vector_results = await self.vector_search(
12 query=query,
13 user_id=user_id,
14 limit=limit * 2
15 )
16
17 # Keyword (BM25) search
18 keyword_results = await self.keyword_search(
19 query=query,
20 user_id=user_id,
21 limit=limit * 2
22 )
23
24 # Reciprocal Rank Fusion (RRF)
25 scores = {}
26 k = 60 # RRF constant
27
28 for rank, mem in enumerate(vector_results):
29 scores[mem.id] = scores.get(mem.id, 0) + 1 / (k + rank + 1)
30
31 for rank, mem in enumerate(keyword_results):
32 scores[mem.id] = scores.get(mem.id, 0) + 1 / (k + rank + 1)
33
34 # Get memories by ID and sort by combined score
35 all_memories = {m.id: m for m in vector_results + keyword_results}
36 sorted_ids = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)
37
38 return [all_memories[id] for id in sorted_ids[:limit]]Index Tuning
For large collections, tune your index parameters. In Pinecone, adjust
pods and replicas. In pgvector, adjust lists for IVFFlat. More lists = faster search but slower inserts.Summary
Key concepts for vector databases in agent memory:
- Embeddings: Convert text to vectors that capture semantic meaning
- Similarity search: Find memories by meaning, not exact keywords
- Database choice: Chroma for simplicity, Pinecone for scale, pgvector for existing Postgres
- Metadata filtering: Combine semantic search with exact filters (user_id, type)
- Optimization: Batch embeddings, cache results, consider hybrid search
Next: We'll explore RAG (Retrieval-Augmented Generation)—the pattern that connects vector search to LLM context.