Chapter 9
20 min read
Section 54 of 175

Vector Databases for Memory

Memory Systems for Agents

Introduction

Vector databases are the backbone of modern AI memory systems. They enable semantic search—finding memories based on meaning rather than exact keywords. When you ask an agent "What did we discuss about the API?", it can find relevant memories even if they never used the word "API" but talked about "endpoints" or "REST interfaces".

Why Vectors: Traditional databases search by exact match. Vector databases search by similarity in meaning. This is what makes intelligent memory retrieval possible.

Embeddings Explained

Before diving into vector databases, we need to understand embeddings—the numeric representations that make semantic search work.

What Are Embeddings?

📝embeddings_explained.txt
1EMBEDDING VISUALIZATION
2
3Text: "The cat sat on the mat"
45   Embedding Model
67Vector: [0.12, -0.34, 0.56, 0.78, -0.23, ...]
8        (typically 384-3072 dimensions)
9
10KEY INSIGHT:
11Similar meanings → Similar vectors
12
13"The cat sat on the mat"    → [0.12, -0.34, 0.56, ...]
14"A feline rested on the rug" → [0.11, -0.32, 0.54, ...]  ← Very similar!
15"Quantum physics is complex" → [0.89, 0.12, -0.67, ...]  ← Very different
16
17SIMILARITY MEASURE (Cosine Similarity):
18cat/mat vs feline/rug: 0.95 (very similar)
19cat/mat vs quantum:    0.12 (not similar)

Generating Embeddings

🐍embeddings.py
1import anthropic
2import openai
3from sentence_transformers import SentenceTransformer
4
5# Option 1: OpenAI Embeddings
6async def embed_with_openai(texts: list[str]) -> list[list[float]]:
7    """Generate embeddings using OpenAI."""
8    client = openai.OpenAI()
9
10    response = client.embeddings.create(
11        model="text-embedding-3-small",  # or "text-embedding-3-large"
12        input=texts
13    )
14
15    return [item.embedding for item in response.data]
16
17
18# Option 2: Anthropic Voyager (via API)
19async def embed_with_voyager(texts: list[str]) -> list[list[float]]:
20    """Generate embeddings using Anthropic Voyager."""
21    client = anthropic.Anthropic()
22
23    # Note: Check current Anthropic API for embedding support
24    embeddings = []
25    for text in texts:
26        # Voyager model for embeddings
27        response = await client.embeddings.create(
28            model="voyage-2",
29            input=text
30        )
31        embeddings.append(response.embedding)
32
33    return embeddings
34
35
36# Option 3: Local embeddings with sentence-transformers
37def embed_locally(texts: list[str]) -> list[list[float]]:
38    """Generate embeddings locally (no API calls)."""
39    model = SentenceTransformer("all-MiniLM-L6-v2")  # Fast, 384 dims
40    # Or: "all-mpnet-base-v2" for better quality, 768 dims
41
42    embeddings = model.encode(texts)
43    return embeddings.tolist()
44
45
46# Embedding model comparison
47EMBEDDING_MODELS = {
48    "text-embedding-3-small": {
49        "provider": "OpenAI",
50        "dimensions": 1536,
51        "cost": "$0.02/1M tokens",
52        "quality": "Good"
53    },
54    "text-embedding-3-large": {
55        "provider": "OpenAI",
56        "dimensions": 3072,
57        "cost": "$0.13/1M tokens",
58        "quality": "Excellent"
59    },
60    "voyage-2": {
61        "provider": "Anthropic/Voyage",
62        "dimensions": 1024,
63        "cost": "$0.10/1M tokens",
64        "quality": "Excellent"
65    },
66    "all-MiniLM-L6-v2": {
67        "provider": "Local",
68        "dimensions": 384,
69        "cost": "Free",
70        "quality": "Good for general use"
71    },
72    "all-mpnet-base-v2": {
73        "provider": "Local",
74        "dimensions": 768,
75        "cost": "Free",
76        "quality": "Better quality"
77    }
78}
ModelProviderDimensionsBest For
text-embedding-3-smallOpenAI1536General purpose, cost-effective
text-embedding-3-largeOpenAI3072Highest quality, complex retrieval
voyage-2Voyage AI1024Code and technical content
all-MiniLM-L6-v2Local384Fast, privacy-sensitive
all-mpnet-base-v2Local768Balance of speed and quality

Vector Database Options

Several vector databases are available, each with different tradeoffs:

Comparison Overview

DatabaseTypeBest ForKey Features
PineconeCloudProduction, scaleManaged, fast, metadata filtering
WeaviateSelf-hosted/CloudComplex queriesGraphQL, hybrid search
ChromaEmbedded/ServerDevelopment, simplicityEasy setup, Python-native
QdrantSelf-hosted/CloudPerformanceRust-based, filtering
pgvectorPostgreSQL extensionExisting Postgres usersFamiliar SQL, ACID
FAISSLibraryMaximum speedFacebook, in-memory

Chroma (Simplest to Start)

🐍chroma_example.py
1import chromadb
2from chromadb.utils import embedding_functions
3
4# Initialize Chroma
5client = chromadb.Client()  # In-memory
6# Or: chromadb.PersistentClient(path="/path/to/db") for persistence
7
8# Set up embedding function
9openai_ef = embedding_functions.OpenAIEmbeddingFunction(
10    api_key="your-api-key",
11    model_name="text-embedding-3-small"
12)
13
14# Create a collection
15collection = client.create_collection(
16    name="agent_memories",
17    embedding_function=openai_ef,
18    metadata={"hnsw:space": "cosine"}  # Use cosine similarity
19)
20
21# Add memories
22collection.add(
23    ids=["mem1", "mem2", "mem3"],
24    documents=[
25        "User prefers concise responses",
26        "User is working on a Python project",
27        "User asked about API rate limiting"
28    ],
29    metadatas=[
30        {"type": "preference", "user_id": "alice"},
31        {"type": "context", "user_id": "alice"},
32        {"type": "question", "user_id": "alice"}
33    ]
34)
35
36# Query memories
37results = collection.query(
38    query_texts=["How should I format responses?"],
39    n_results=3,
40    where={"user_id": "alice"}  # Metadata filtering
41)
42
43print(results["documents"])
44# [['User prefers concise responses', ...]]

Pinecone (Production Scale)

🐍pinecone_example.py
1from pinecone import Pinecone, ServerlessSpec
2import openai
3
4# Initialize Pinecone
5pc = Pinecone(api_key="your-api-key")
6
7# Create index (one-time)
8pc.create_index(
9    name="agent-memories",
10    dimension=1536,  # Match your embedding model
11    metric="cosine",
12    spec=ServerlessSpec(
13        cloud="aws",
14        region="us-east-1"
15    )
16)
17
18# Connect to index
19index = pc.Index("agent-memories")
20
21# Generate embeddings
22def get_embedding(text: str) -> list[float]:
23    client = openai.OpenAI()
24    response = client.embeddings.create(
25        model="text-embedding-3-small",
26        input=text
27    )
28    return response.data[0].embedding
29
30# Upsert memories
31memories = [
32    {"id": "mem1", "text": "User prefers dark mode", "type": "preference"},
33    {"id": "mem2", "text": "Working on ML project", "type": "context"},
34]
35
36vectors = []
37for mem in memories:
38    embedding = get_embedding(mem["text"])
39    vectors.append({
40        "id": mem["id"],
41        "values": embedding,
42        "metadata": {
43            "text": mem["text"],
44            "type": mem["type"],
45            "user_id": "alice"
46        }
47    })
48
49index.upsert(vectors=vectors)
50
51# Query
52query_embedding = get_embedding("What are the user's preferences?")
53
54results = index.query(
55    vector=query_embedding,
56    top_k=5,
57    include_metadata=True,
58    filter={"user_id": {"$eq": "alice"}}
59)
60
61for match in results["matches"]:
62    print(f"Score: {match['score']:.3f} - {match['metadata']['text']}")

pgvector (PostgreSQL)

🐍pgvector_example.py
1import asyncpg
2import numpy as np
3
4# SQL to set up pgvector
5SETUP_SQL = """
6CREATE EXTENSION IF NOT EXISTS vector;
7
8CREATE TABLE IF NOT EXISTS memories (
9    id SERIAL PRIMARY KEY,
10    user_id TEXT NOT NULL,
11    content TEXT NOT NULL,
12    memory_type TEXT NOT NULL,
13    embedding vector(1536),
14    created_at TIMESTAMP DEFAULT NOW(),
15    importance FLOAT DEFAULT 0.5
16);
17
18CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops)
19WITH (lists = 100);
20"""
21
22class PgVectorMemory:
23    def __init__(self, connection_string: str):
24        self.conn_string = connection_string
25        self.pool = None
26
27    async def connect(self):
28        self.pool = await asyncpg.create_pool(self.conn_string)
29
30    async def store(
31        self,
32        user_id: str,
33        content: str,
34        memory_type: str,
35        embedding: list[float],
36        importance: float = 0.5
37    ) -> int:
38        async with self.pool.acquire() as conn:
39            result = await conn.fetchrow(
40                """
41                INSERT INTO memories (user_id, content, memory_type, embedding, importance)
42                VALUES ($1, $2, $3, $4, $5)
43                RETURNING id
44                """,
45                user_id, content, memory_type, embedding, importance
46            )
47            return result["id"]
48
49    async def search(
50        self,
51        user_id: str,
52        query_embedding: list[float],
53        limit: int = 10,
54        memory_type: str = None
55    ) -> list[dict]:
56        async with self.pool.acquire() as conn:
57            # Build query with optional type filter
58            if memory_type:
59                rows = await conn.fetch(
60                    """
61                    SELECT id, content, memory_type, importance,
62                           1 - (embedding <=> $1) as similarity
63                    FROM memories
64                    WHERE user_id = $2 AND memory_type = $3
65                    ORDER BY embedding <=> $1
66                    LIMIT $4
67                    """,
68                    query_embedding, user_id, memory_type, limit
69                )
70            else:
71                rows = await conn.fetch(
72                    """
73                    SELECT id, content, memory_type, importance,
74                           1 - (embedding <=> $1) as similarity
75                    FROM memories
76                    WHERE user_id = $2
77                    ORDER BY embedding <=> $1
78                    LIMIT $3
79                    """,
80                    query_embedding, user_id, limit
81                )
82
83            return [dict(row) for row in rows]

Implementation Examples

Here's a complete memory store implementation using vector databases:

🐍vector_memory_store.py
1from abc import ABC, abstractmethod
2from dataclasses import dataclass
3from datetime import datetime
4from typing import Optional
5import hashlib
6
7@dataclass
8class Memory:
9    id: str
10    user_id: str
11    content: str
12    memory_type: str
13    embedding: list[float]
14    importance: float
15    created_at: datetime
16    metadata: dict
17
18class VectorMemoryStore(ABC):
19    """Abstract base class for vector memory stores."""
20
21    @abstractmethod
22    async def store(self, memory: Memory) -> str:
23        pass
24
25    @abstractmethod
26    async def search(
27        self,
28        user_id: str,
29        query_embedding: list[float],
30        limit: int = 10,
31        filters: dict = None
32    ) -> list[Memory]:
33        pass
34
35    @abstractmethod
36    async def delete(self, memory_id: str) -> bool:
37        pass
38
39    @abstractmethod
40    async def delete_user_memories(self, user_id: str) -> int:
41        pass
42
43
44class ChromaMemoryStore(VectorMemoryStore):
45    """Chroma-based implementation."""
46
47    def __init__(self, collection_name: str = "memories"):
48        import chromadb
49        self.client = chromadb.PersistentClient(path="./chroma_db")
50        self.collection = self.client.get_or_create_collection(
51            name=collection_name,
52            metadata={"hnsw:space": "cosine"}
53        )
54
55    async def store(self, memory: Memory) -> str:
56        self.collection.add(
57            ids=[memory.id],
58            embeddings=[memory.embedding],
59            documents=[memory.content],
60            metadatas=[{
61                "user_id": memory.user_id,
62                "memory_type": memory.memory_type,
63                "importance": memory.importance,
64                "created_at": memory.created_at.isoformat(),
65                **memory.metadata
66            }]
67        )
68        return memory.id
69
70    async def search(
71        self,
72        user_id: str,
73        query_embedding: list[float],
74        limit: int = 10,
75        filters: dict = None
76    ) -> list[Memory]:
77        where_clause = {"user_id": user_id}
78        if filters:
79            where_clause.update(filters)
80
81        results = self.collection.query(
82            query_embeddings=[query_embedding],
83            n_results=limit,
84            where=where_clause,
85            include=["documents", "metadatas", "embeddings", "distances"]
86        )
87
88        memories = []
89        for i in range(len(results["ids"][0])):
90            meta = results["metadatas"][0][i]
91            memories.append(Memory(
92                id=results["ids"][0][i],
93                user_id=meta["user_id"],
94                content=results["documents"][0][i],
95                memory_type=meta["memory_type"],
96                embedding=results["embeddings"][0][i] if results["embeddings"] else [],
97                importance=meta["importance"],
98                created_at=datetime.fromisoformat(meta["created_at"]),
99                metadata={k: v for k, v in meta.items()
100                         if k not in ["user_id", "memory_type", "importance", "created_at"]}
101            ))
102
103        return memories
104
105    async def delete(self, memory_id: str) -> bool:
106        try:
107            self.collection.delete(ids=[memory_id])
108            return True
109        except Exception:
110            return False
111
112    async def delete_user_memories(self, user_id: str) -> int:
113        # Get all memory IDs for user
114        results = self.collection.get(
115            where={"user_id": user_id},
116            include=[]
117        )
118        if results["ids"]:
119            self.collection.delete(ids=results["ids"])
120            return len(results["ids"])
121        return 0
122
123
124class MemoryManager:
125    """High-level memory management with embedding generation."""
126
127    def __init__(
128        self,
129        store: VectorMemoryStore,
130        embedding_model: str = "text-embedding-3-small"
131    ):
132        self.store = store
133        self.embedding_model = embedding_model
134
135    async def embed(self, text: str) -> list[float]:
136        import openai
137        client = openai.OpenAI()
138        response = client.embeddings.create(
139            model=self.embedding_model,
140            input=text
141        )
142        return response.data[0].embedding
143
144    def _generate_id(self, content: str, user_id: str) -> str:
145        hash_input = f"{user_id}:{content}:{datetime.now().isoformat()}"
146        return hashlib.sha256(hash_input.encode()).hexdigest()[:16]
147
148    async def remember(
149        self,
150        user_id: str,
151        content: str,
152        memory_type: str = "general",
153        importance: float = 0.5,
154        metadata: dict = None
155    ) -> str:
156        embedding = await self.embed(content)
157
158        memory = Memory(
159            id=self._generate_id(content, user_id),
160            user_id=user_id,
161            content=content,
162            memory_type=memory_type,
163            embedding=embedding,
164            importance=importance,
165            created_at=datetime.now(),
166            metadata=metadata or {}
167        )
168
169        return await self.store.store(memory)
170
171    async def recall(
172        self,
173        user_id: str,
174        query: str,
175        limit: int = 10,
176        memory_types: list[str] = None,
177        min_importance: float = 0.0
178    ) -> list[Memory]:
179        query_embedding = await self.embed(query)
180
181        filters = {}
182        if memory_types:
183            filters["memory_type"] = {"$in": memory_types}
184        if min_importance > 0:
185            filters["importance"] = {"$gte": min_importance}
186
187        return await self.store.search(
188            user_id=user_id,
189            query_embedding=query_embedding,
190            limit=limit,
191            filters=filters if filters else None
192        )
193
194    async def forget(self, memory_id: str) -> bool:
195        return await self.store.delete(memory_id)
196
197    async def forget_user(self, user_id: str) -> int:
198        return await self.store.delete_user_memories(user_id)

Optimization Techniques

Optimizing vector search for production:

Batch Embedding

🐍batch_embedding.py
1async def batch_embed(
2    texts: list[str],
3    batch_size: int = 100
4) -> list[list[float]]:
5    """Embed texts in batches for efficiency."""
6    import openai
7    client = openai.OpenAI()
8
9    all_embeddings = []
10
11    for i in range(0, len(texts), batch_size):
12        batch = texts[i:i + batch_size]
13        response = client.embeddings.create(
14            model="text-embedding-3-small",
15            input=batch
16        )
17        all_embeddings.extend([item.embedding for item in response.data])
18
19    return all_embeddings

Caching Embeddings

🐍embedding_cache.py
1import hashlib
2from functools import lru_cache
3
4class EmbeddingCache:
5    """Cache embeddings to avoid recomputation."""
6
7    def __init__(self, max_size: int = 10000):
8        self.cache = {}
9        self.max_size = max_size
10
11    def _hash_text(self, text: str) -> str:
12        return hashlib.sha256(text.encode()).hexdigest()
13
14    async def get_embedding(
15        self,
16        text: str,
17        embed_fn
18    ) -> list[float]:
19        key = self._hash_text(text)
20
21        if key in self.cache:
22            return self.cache[key]
23
24        embedding = await embed_fn(text)
25
26        # LRU eviction if cache is full
27        if len(self.cache) >= self.max_size:
28            oldest_key = next(iter(self.cache))
29            del self.cache[oldest_key]
30
31        self.cache[key] = embedding
32        return embedding
🐍hybrid_search.py
1class HybridSearch:
2    """Combine vector search with keyword search."""
3
4    async def search(
5        self,
6        query: str,
7        user_id: str,
8        limit: int = 10
9    ) -> list[Memory]:
10        # Vector (semantic) search
11        vector_results = await self.vector_search(
12            query=query,
13            user_id=user_id,
14            limit=limit * 2
15        )
16
17        # Keyword (BM25) search
18        keyword_results = await self.keyword_search(
19            query=query,
20            user_id=user_id,
21            limit=limit * 2
22        )
23
24        # Reciprocal Rank Fusion (RRF)
25        scores = {}
26        k = 60  # RRF constant
27
28        for rank, mem in enumerate(vector_results):
29            scores[mem.id] = scores.get(mem.id, 0) + 1 / (k + rank + 1)
30
31        for rank, mem in enumerate(keyword_results):
32            scores[mem.id] = scores.get(mem.id, 0) + 1 / (k + rank + 1)
33
34        # Get memories by ID and sort by combined score
35        all_memories = {m.id: m for m in vector_results + keyword_results}
36        sorted_ids = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)
37
38        return [all_memories[id] for id in sorted_ids[:limit]]

Index Tuning

For large collections, tune your index parameters. In Pinecone, adjust pods and replicas. In pgvector, adjust lists for IVFFlat. More lists = faster search but slower inserts.

Summary

Key concepts for vector databases in agent memory:

  1. Embeddings: Convert text to vectors that capture semantic meaning
  2. Similarity search: Find memories by meaning, not exact keywords
  3. Database choice: Chroma for simplicity, Pinecone for scale, pgvector for existing Postgres
  4. Metadata filtering: Combine semantic search with exact filters (user_id, type)
  5. Optimization: Batch embeddings, cache results, consider hybrid search
Next: We'll explore RAG (Retrieval-Augmented Generation)—the pattern that connects vector search to LLM context.