Introduction
Without memory, every interaction is a fresh start. The agent forgets your name seconds after you tell it. It can't learn from mistakes. It asks the same clarifying questions repeatedly. Memory transforms an agent from a sophisticated autocomplete into something that can actually build relationships, learn preferences, and accumulate knowledge.
The Core Problem: LLMs are stateless by design. Each API call is independentβthe model has no inherent memory of previous conversations. Building effective agents requires adding memory systems on top.
The Stateless Problem
Understanding why memory is hard requires understanding how LLMs actually work:
How LLM Context Works
πcontext_window.txt
1LLM CONTEXT WINDOW
2
3βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4β CONTEXT WINDOW β
5β (e.g., 200K tokens for Claude) β
6β β
7β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
8β β System Prompt β β
9β β "You are a helpful assistant..." β β
10β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
11β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
12β β Message 1: User says "Hi, I'm Alice" β β
13β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
14β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
15β β Message 2: Assistant says "Hello Alice!" β β
16β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
17β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
18β β Message 3: User says "What's my name?" β β
19β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
20β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
21β β β Model generates: "Your name is Alice" β β
22β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
23β β
24βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
25
26The model "remembers" Alice's name ONLY because it's
27still in the context window. There's no persistent memory.The Context Limit Problem
πcontext_overflow.py
1# What happens when context fills up?
2
3conversation_history = []
4
5for i in range(1000):
6 user_message = get_user_input()
7 conversation_history.append({"role": "user", "content": user_message})
8
9 # Eventually, this exceeds the context window
10 response = llm.generate(
11 system=system_prompt,
12 messages=conversation_history # π₯ Too long!
13 )
14
15 conversation_history.append({"role": "assistant", "content": response})
16
17# Options when context fills:
18# 1. Truncate old messages (lose information)
19# 2. Summarize old messages (lose details)
20# 3. Use external memory (what we'll learn)The Session Boundary Problem
πsession_problem.txt
1SESSION 1 (Monday):
2User: "I prefer dark mode and concise responses"
3Agent: "Got it! I'll use dark mode and keep responses brief."
4
5SESSION 2 (Tuesday):
6User: "Show me the dashboard"
7Agent: [Shows light mode, verbose explanation]
8User: "I told you yesterday I prefer dark mode!"
9Agent: "I'm sorry, I don't have any memory of previous sessions."
10
11βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
12
13The problem: Each session starts fresh.
14- User preferences forgotten
15- Prior context lost
16- Previous work not remembered
17- Relationships don't develop| Problem | Impact | User Experience |
|---|---|---|
| Context overflow | Old messages truncated | Agent forgets earlier conversation |
| Session boundaries | No cross-session memory | Must repeat preferences each time |
| No learning | Same mistakes repeated | Agent never improves from feedback |
| No personalization | Generic responses | Feels like talking to a stranger |
What Memory Enables
Proper memory systems unlock entirely new capabilities:
Personalization
πpersonalization.py
1# With memory: Personalized interactions
2
3async def handle_request(user_id: str, request: str):
4 # Retrieve user's preferences and history
5 user_profile = await memory.get_user_profile(user_id)
6
7 # Customize system prompt based on preferences
8 system = f"""You are helping {user_profile.name}.
9
10Preferences:
11- Communication style: {user_profile.style} # "concise" or "detailed"
12- Technical level: {user_profile.expertise} # "beginner" to "expert"
13- Timezone: {user_profile.timezone}
14- Previous projects: {user_profile.projects}
15
16Recent context:
17{await memory.get_recent_context(user_id)}
18"""
19
20 response = await llm.generate(system=system, messages=[...])
21
22 # Update memory with new interaction
23 await memory.store_interaction(user_id, request, response)
24
25 return responseLearning from Feedback
πlearning.py
1# With memory: Agent learns from corrections
2
3class LearningAgent:
4 async def handle_with_learning(self, request: str):
5 # Check for similar past mistakes
6 past_corrections = await self.memory.get_corrections(
7 similar_to=request
8 )
9
10 if past_corrections:
11 # Include learned lessons in context
12 lessons = "\n".join([
13 f"- When asked about {c.topic}, remember: {c.correction}"
14 for c in past_corrections
15 ])
16 context = f"Lessons from past feedback:\n{lessons}"
17 else:
18 context = ""
19
20 response = await self.generate(request, extra_context=context)
21 return response
22
23 async def receive_feedback(self, request: str, response: str, feedback: str):
24 # Store correction for future reference
25 await self.memory.store_correction(
26 topic=self.extract_topic(request),
27 original_response=response,
28 correction=feedback,
29 embedding=await self.embed(request)
30 )Long-Running Tasks
πlong_tasks.py
1# With memory: Continue complex tasks across sessions
2
3class ProjectAgent:
4 async def work_on_project(self, user_id: str, instruction: str):
5 # Load project state
6 project = await self.memory.get_project_state(user_id)
7
8 if project:
9 context = f"""
10Continuing project: {project.name}
11
12Current status:
13- Completed: {project.completed_tasks}
14- In progress: {project.current_task}
15- Remaining: {project.pending_tasks}
16
17Recent changes:
18{project.recent_changes}
19"""
20 else:
21 context = "Starting new project"
22
23 # Work on the project
24 result = await self.execute_task(instruction, context)
25
26 # Save updated state
27 await self.memory.update_project_state(user_id, result)
28
29 return resultKnowledge Accumulation
πknowledge.py
1# With memory: Build domain knowledge over time
2
3class KnowledgeAgent:
4 async def answer_with_knowledge(self, question: str):
5 # Search accumulated knowledge
6 relevant_knowledge = await self.memory.search_knowledge(
7 query=question,
8 limit=10
9 )
10
11 if relevant_knowledge:
12 knowledge_context = "\n".join([
13 f"- {k.fact} (learned from: {k.source})"
14 for k in relevant_knowledge
15 ])
16 else:
17 knowledge_context = "No relevant prior knowledge."
18
19 response = await self.generate(
20 question,
21 extra_context=f"Relevant knowledge:\n{knowledge_context}"
22 )
23
24 # Extract and store any new facts learned
25 new_facts = await self.extract_facts(question, response)
26 for fact in new_facts:
27 await self.memory.store_knowledge(fact)
28
29 return responseTypes of Agent Memory
Agent memory systems typically include several distinct types:
| Memory Type | Duration | Purpose | Example |
|---|---|---|---|
| Working Memory | Current task | Immediate context | Current conversation messages |
| Short-term Memory | Single session | Session context | Topics discussed today |
| Long-term Memory | Persistent | Accumulated knowledge | User preferences, learned facts |
| Episodic Memory | Persistent | Specific experiences | Past conversations, events |
| Semantic Memory | Persistent | General knowledge | Facts, concepts, relationships |
| Procedural Memory | Persistent | How to do things | Learned workflows, patterns |
πmemory_types.txt
1AGENT MEMORY HIERARCHY
2
3βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4β WORKING MEMORY β
5β (Current context window) β
6β β’ System prompt β
7β β’ Current conversation β
8β β’ Active tool results β
9β β² β
10β β retrieved as needed β
11ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€
12β SHORT-TERM MEMORY β
13β (Session-level storage) β
14β β’ Recent interactions β
15β β’ Current task state β
16β β’ Temporary context β
17β β² β
18β β promoted if important β
19ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€
20β LONG-TERM MEMORY β
21β (Persistent storage) β
22β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
23β β Episodic β β Semantic β β Procedural β β
24β β (events) β β (facts) β β (skills) β β
25β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
26βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββMemory Challenges
Building effective memory systems involves solving several hard problems:
1. What to Remember
πwhat_to_remember.py
1# Challenge: Not everything is worth remembering
2
3class MemoryFilter:
4 async def should_remember(self, interaction: Interaction) -> bool:
5 # Skip trivial interactions
6 if interaction.is_greeting or interaction.is_small_talk:
7 return False
8
9 # Remember explicit user preferences
10 if self.contains_preference(interaction):
11 return True
12
13 # Remember corrections and feedback
14 if interaction.contains_correction:
15 return True
16
17 # Remember important facts
18 importance = await self.assess_importance(interaction)
19 if importance > 0.7:
20 return True
21
22 # Remember novel information
23 novelty = await self.assess_novelty(interaction)
24 if novelty > 0.8:
25 return True
26
27 return False2. How to Retrieve
πhow_to_retrieve.py
1# Challenge: Finding relevant memories efficiently
2
3class MemoryRetriever:
4 async def retrieve_relevant(
5 self,
6 query: str,
7 context: str,
8 limit: int = 10
9 ) -> list[Memory]:
10 # Multiple retrieval strategies
11
12 # Semantic search (meaning-based)
13 semantic_results = await self.vector_search(
14 query_embedding=await self.embed(query),
15 limit=limit
16 )
17
18 # Recency-weighted (recent is often relevant)
19 recent_results = await self.get_recent(
20 hours=24,
21 limit=limit // 2
22 )
23
24 # Entity-based (mentioned people, places, things)
25 entities = self.extract_entities(query)
26 entity_results = await self.search_by_entities(
27 entities=entities,
28 limit=limit // 2
29 )
30
31 # Combine and deduplicate
32 all_results = self.merge_and_rank(
33 semantic_results,
34 recent_results,
35 entity_results
36 )
37
38 return all_results[:limit]3. When to Forget
πwhen_to_forget.py
1# Challenge: Memory can't grow forever
2
3class MemoryManager:
4 async def manage_memory_size(self):
5 # Strategy 1: Time-based decay
6 await self.decay_old_memories(
7 older_than_days=90,
8 importance_threshold=0.3
9 )
10
11 # Strategy 2: Consolidation (merge similar memories)
12 similar_groups = await self.find_similar_memories()
13 for group in similar_groups:
14 consolidated = await self.consolidate(group)
15 await self.replace_group_with(group, consolidated)
16
17 # Strategy 3: Importance-based pruning
18 if await self.get_memory_count() > self.max_memories:
19 least_important = await self.get_least_important(
20 count=self.max_memories // 10
21 )
22 await self.archive_or_delete(least_important)
23
24 # Strategy 4: User-specific limits
25 for user_id in await self.get_users():
26 user_count = await self.get_user_memory_count(user_id)
27 if user_count > self.per_user_limit:
28 await self.prune_user_memories(user_id)4. Privacy and Security
πprivacy.py
1# Challenge: Memories can contain sensitive information
2
3class SecureMemory:
4 async def store_securely(self, memory: Memory, user_id: str):
5 # Classify sensitivity
6 sensitivity = await self.classify_sensitivity(memory.content)
7
8 if sensitivity == "high":
9 # Encrypt sensitive memories
10 encrypted = await self.encrypt(memory.content, user_key=user_id)
11 memory.content = encrypted
12 memory.encrypted = True
13
14 # Apply retention policies
15 if sensitivity == "high":
16 memory.retention_days = 30
17 elif sensitivity == "medium":
18 memory.retention_days = 90
19 else:
20 memory.retention_days = 365
21
22 # Store with user isolation
23 await self.store(
24 memory=memory,
25 partition=user_id, # User data isolation
26 access_control=self.get_acl(user_id)
27 )
28
29 async def handle_deletion_request(self, user_id: str):
30 # GDPR/CCPA compliance
31 await self.delete_all_user_memories(user_id)
32 await self.log_deletion(user_id)
33 return {"deleted": True, "user_id": user_id}Memory systems that store user data must comply with privacy regulations (GDPR, CCPA). Implement data deletion, export, and retention policies from the start.
Summary
Why memory matters for agents:
- Stateless problem: LLMs have no inherent memoryβwe must build it
- Context limits: Even large context windows eventually fill up
- Session boundaries: Without memory, every session starts fresh
- Enables personalization: Remember preferences, adapt to users
- Enables learning: Improve from feedback, accumulate knowledge
- Multiple memory types: Working, short-term, long-term, episodic, semantic
Next: Let's explore the distinction between short-term and long-term memory, and how to implement each.