Introduction
Without memory, every interaction is a fresh start. The agent forgets your name seconds after you tell it. It can't learn from mistakes. It asks the same clarifying questions repeatedly. Memory transforms an agent from a sophisticated autocomplete into something that can actually build relationships, learn preferences, and accumulate knowledge.
The Core Problem: LLMs are stateless by design. Each API call is independent—the model has no inherent memory of previous conversations. Building effective agents requires adding memory systems on top.
The Stateless Problem
Understanding why memory is hard requires understanding how LLMs actually work:
How LLM Context Works
📝context_window.txt
1LLM CONTEXT WINDOW
2
3┌─────────────────────────────────────────────────────────┐
4│ CONTEXT WINDOW │
5│ (e.g., 200K tokens for Claude) │
6│ │
7│ ┌────────────────────────────────────────────────────┐ │
8│ │ System Prompt │ │
9│ │ "You are a helpful assistant..." │ │
10│ └────────────────────────────────────────────────────┘ │
11│ ┌────────────────────────────────────────────────────┐ │
12│ │ Message 1: User says "Hi, I'm Alice" │ │
13│ └────────────────────────────────────────────────────┘ │
14│ ┌────────────────────────────────────────────────────┐ │
15│ │ Message 2: Assistant says "Hello Alice!" │ │
16│ └────────────────────────────────────────────────────┘ │
17│ ┌────────────────────────────────────────────────────┐ │
18│ │ Message 3: User says "What's my name?" │ │
19│ └────────────────────────────────────────────────────┘ │
20│ ┌────────────────────────────────────────────────────┐ │
21│ │ → Model generates: "Your name is Alice" │ │
22│ └────────────────────────────────────────────────────┘ │
23│ │
24└─────────────────────────────────────────────────────────┘
25
26The model "remembers" Alice's name ONLY because it's
27still in the context window. There's no persistent memory.The Context Limit Problem
🐍context_overflow.py
1# What happens when context fills up?
2
3conversation_history = []
4
5for i in range(1000):
6 user_message = get_user_input()
7 conversation_history.append({"role": "user", "content": user_message})
8
9 # Eventually, this exceeds the context window
10 response = llm.generate(
11 system=system_prompt,
12 messages=conversation_history # 💥 Too long!
13 )
14
15 conversation_history.append({"role": "assistant", "content": response})
16
17# Options when context fills:
18# 1. Truncate old messages (lose information)
19# 2. Summarize old messages (lose details)
20# 3. Use external memory (what we'll learn)The Session Boundary Problem
📝session_problem.txt
1SESSION 1 (Monday):
2User: "I prefer dark mode and concise responses"
3Agent: "Got it! I'll use dark mode and keep responses brief."
4
5SESSION 2 (Tuesday):
6User: "Show me the dashboard"
7Agent: [Shows light mode, verbose explanation]
8User: "I told you yesterday I prefer dark mode!"
9Agent: "I'm sorry, I don't have any memory of previous sessions."
10
11─────────────────────────────────────────────────────────
12
13The problem: Each session starts fresh.
14- User preferences forgotten
15- Prior context lost
16- Previous work not remembered
17- Relationships don't develop| Problem | Impact | User Experience |
|---|---|---|
| Context overflow | Old messages truncated | Agent forgets earlier conversation |
| Session boundaries | No cross-session memory | Must repeat preferences each time |
| No learning | Same mistakes repeated | Agent never improves from feedback |
| No personalization | Generic responses | Feels like talking to a stranger |
What Memory Enables
Proper memory systems unlock entirely new capabilities:
Personalization
🐍personalization.py
1# With memory: Personalized interactions
2
3async def handle_request(user_id: str, request: str):
4 # Retrieve user's preferences and history
5 user_profile = await memory.get_user_profile(user_id)
6
7 # Customize system prompt based on preferences
8 system = f"""You are helping {user_profile.name}.
9
10Preferences:
11- Communication style: {user_profile.style} # "concise" or "detailed"
12- Technical level: {user_profile.expertise} # "beginner" to "expert"
13- Timezone: {user_profile.timezone}
14- Previous projects: {user_profile.projects}
15
16Recent context:
17{await memory.get_recent_context(user_id)}
18"""
19
20 response = await llm.generate(system=system, messages=[...])
21
22 # Update memory with new interaction
23 await memory.store_interaction(user_id, request, response)
24
25 return responseLearning from Feedback
🐍learning.py
1# With memory: Agent learns from corrections
2
3class LearningAgent:
4 async def handle_with_learning(self, request: str):
5 # Check for similar past mistakes
6 past_corrections = await self.memory.get_corrections(
7 similar_to=request
8 )
9
10 if past_corrections:
11 # Include learned lessons in context
12 lessons = "\n".join([
13 f"- When asked about {c.topic}, remember: {c.correction}"
14 for c in past_corrections
15 ])
16 context = f"Lessons from past feedback:\n{lessons}"
17 else:
18 context = ""
19
20 response = await self.generate(request, extra_context=context)
21 return response
22
23 async def receive_feedback(self, request: str, response: str, feedback: str):
24 # Store correction for future reference
25 await self.memory.store_correction(
26 topic=self.extract_topic(request),
27 original_response=response,
28 correction=feedback,
29 embedding=await self.embed(request)
30 )Long-Running Tasks
🐍long_tasks.py
1# With memory: Continue complex tasks across sessions
2
3class ProjectAgent:
4 async def work_on_project(self, user_id: str, instruction: str):
5 # Load project state
6 project = await self.memory.get_project_state(user_id)
7
8 if project:
9 context = f"""
10Continuing project: {project.name}
11
12Current status:
13- Completed: {project.completed_tasks}
14- In progress: {project.current_task}
15- Remaining: {project.pending_tasks}
16
17Recent changes:
18{project.recent_changes}
19"""
20 else:
21 context = "Starting new project"
22
23 # Work on the project
24 result = await self.execute_task(instruction, context)
25
26 # Save updated state
27 await self.memory.update_project_state(user_id, result)
28
29 return resultKnowledge Accumulation
🐍knowledge.py
1# With memory: Build domain knowledge over time
2
3class KnowledgeAgent:
4 async def answer_with_knowledge(self, question: str):
5 # Search accumulated knowledge
6 relevant_knowledge = await self.memory.search_knowledge(
7 query=question,
8 limit=10
9 )
10
11 if relevant_knowledge:
12 knowledge_context = "\n".join([
13 f"- {k.fact} (learned from: {k.source})"
14 for k in relevant_knowledge
15 ])
16 else:
17 knowledge_context = "No relevant prior knowledge."
18
19 response = await self.generate(
20 question,
21 extra_context=f"Relevant knowledge:\n{knowledge_context}"
22 )
23
24 # Extract and store any new facts learned
25 new_facts = await self.extract_facts(question, response)
26 for fact in new_facts:
27 await self.memory.store_knowledge(fact)
28
29 return responseTypes of Agent Memory
Agent memory systems typically include several distinct types:
| Memory Type | Duration | Purpose | Example |
|---|---|---|---|
| Working Memory | Current task | Immediate context | Current conversation messages |
| Short-term Memory | Single session | Session context | Topics discussed today |
| Long-term Memory | Persistent | Accumulated knowledge | User preferences, learned facts |
| Episodic Memory | Persistent | Specific experiences | Past conversations, events |
| Semantic Memory | Persistent | General knowledge | Facts, concepts, relationships |
| Procedural Memory | Persistent | How to do things | Learned workflows, patterns |
📝memory_types.txt
1AGENT MEMORY HIERARCHY
2
3┌─────────────────────────────────────────────────────────┐
4│ WORKING MEMORY │
5│ (Current context window) │
6│ • System prompt │
7│ • Current conversation │
8│ • Active tool results │
9│ ▲ │
10│ │ retrieved as needed │
11├────────────────────┼────────────────────────────────────┤
12│ SHORT-TERM MEMORY │
13│ (Session-level storage) │
14│ • Recent interactions │
15│ • Current task state │
16│ • Temporary context │
17│ ▲ │
18│ │ promoted if important │
19├────────────────────┼────────────────────────────────────┤
20│ LONG-TERM MEMORY │
21│ (Persistent storage) │
22│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
23│ │ Episodic │ │ Semantic │ │ Procedural │ │
24│ │ (events) │ │ (facts) │ │ (skills) │ │
25│ └─────────────┘ └─────────────┘ └─────────────┘ │
26└─────────────────────────────────────────────────────────┘Memory Challenges
Building effective memory systems involves solving several hard problems:
1. What to Remember
🐍what_to_remember.py
1# Challenge: Not everything is worth remembering
2
3class MemoryFilter:
4 async def should_remember(self, interaction: Interaction) -> bool:
5 # Skip trivial interactions
6 if interaction.is_greeting or interaction.is_small_talk:
7 return False
8
9 # Remember explicit user preferences
10 if self.contains_preference(interaction):
11 return True
12
13 # Remember corrections and feedback
14 if interaction.contains_correction:
15 return True
16
17 # Remember important facts
18 importance = await self.assess_importance(interaction)
19 if importance > 0.7:
20 return True
21
22 # Remember novel information
23 novelty = await self.assess_novelty(interaction)
24 if novelty > 0.8:
25 return True
26
27 return False2. How to Retrieve
🐍how_to_retrieve.py
1# Challenge: Finding relevant memories efficiently
2
3class MemoryRetriever:
4 async def retrieve_relevant(
5 self,
6 query: str,
7 context: str,
8 limit: int = 10
9 ) -> list[Memory]:
10 # Multiple retrieval strategies
11
12 # Semantic search (meaning-based)
13 semantic_results = await self.vector_search(
14 query_embedding=await self.embed(query),
15 limit=limit
16 )
17
18 # Recency-weighted (recent is often relevant)
19 recent_results = await self.get_recent(
20 hours=24,
21 limit=limit // 2
22 )
23
24 # Entity-based (mentioned people, places, things)
25 entities = self.extract_entities(query)
26 entity_results = await self.search_by_entities(
27 entities=entities,
28 limit=limit // 2
29 )
30
31 # Combine and deduplicate
32 all_results = self.merge_and_rank(
33 semantic_results,
34 recent_results,
35 entity_results
36 )
37
38 return all_results[:limit]3. When to Forget
🐍when_to_forget.py
1# Challenge: Memory can't grow forever
2
3class MemoryManager:
4 async def manage_memory_size(self):
5 # Strategy 1: Time-based decay
6 await self.decay_old_memories(
7 older_than_days=90,
8 importance_threshold=0.3
9 )
10
11 # Strategy 2: Consolidation (merge similar memories)
12 similar_groups = await self.find_similar_memories()
13 for group in similar_groups:
14 consolidated = await self.consolidate(group)
15 await self.replace_group_with(group, consolidated)
16
17 # Strategy 3: Importance-based pruning
18 if await self.get_memory_count() > self.max_memories:
19 least_important = await self.get_least_important(
20 count=self.max_memories // 10
21 )
22 await self.archive_or_delete(least_important)
23
24 # Strategy 4: User-specific limits
25 for user_id in await self.get_users():
26 user_count = await self.get_user_memory_count(user_id)
27 if user_count > self.per_user_limit:
28 await self.prune_user_memories(user_id)4. Privacy and Security
🐍privacy.py
1# Challenge: Memories can contain sensitive information
2
3class SecureMemory:
4 async def store_securely(self, memory: Memory, user_id: str):
5 # Classify sensitivity
6 sensitivity = await self.classify_sensitivity(memory.content)
7
8 if sensitivity == "high":
9 # Encrypt sensitive memories
10 encrypted = await self.encrypt(memory.content, user_key=user_id)
11 memory.content = encrypted
12 memory.encrypted = True
13
14 # Apply retention policies
15 if sensitivity == "high":
16 memory.retention_days = 30
17 elif sensitivity == "medium":
18 memory.retention_days = 90
19 else:
20 memory.retention_days = 365
21
22 # Store with user isolation
23 await self.store(
24 memory=memory,
25 partition=user_id, # User data isolation
26 access_control=self.get_acl(user_id)
27 )
28
29 async def handle_deletion_request(self, user_id: str):
30 # GDPR/CCPA compliance
31 await self.delete_all_user_memories(user_id)
32 await self.log_deletion(user_id)
33 return {"deleted": True, "user_id": user_id}Memory systems that store user data must comply with privacy regulations (GDPR, CCPA). Implement data deletion, export, and retention policies from the start.
Summary
Why memory matters for agents:
- Stateless problem: LLMs have no inherent memory—we must build it
- Context limits: Even large context windows eventually fill up
- Session boundaries: Without memory, every session starts fresh
- Enables personalization: Remember preferences, adapt to users
- Enables learning: Improve from feedback, accumulate knowledge
- Multiple memory types: Working, short-term, long-term, episodic, semantic
Next: Let's explore the distinction between short-term and long-term memory, and how to implement each.