Agent Memory Systems: The Complete Guide (2026)
LLMs forget everything between sessions. The 3 memory types, the write-maintain-retrieve lifecycle, and MemGPT vs Mem0 vs Claude Code compared.
Course outline · AI Agents (1.7)
You spent 30 minutes teaching your agent the project: TypeScript, Vitest, Supabase, deploys on Vercel. It nailed the work. You closed the terminal. Tomorrow it knows nothing.
This is not a bug - it's the architecture. An LLM's only "memory" is the context window: the text you send with each request. New session, empty window, total amnesia. The in-conversation memory you experience is just prior messages being re-sent every turn.
For one-shot questions, fine. For an AI that works with you across weeks - remembering your preferences, your stack, last Tuesday's debugging conclusion - you need a memory system bolted on from outside. Here's the full map: the three memory types, where memories physically live, the lifecycle that keeps them useful, and how MemGPT, Mem0, and Claude Code each solved it differently.
Three Kinds of Memory (Borrowed From Your Brain)

Cognitive science gave agent design its most useful taxonomy. Three types, three jobs:
| Type | Answers | Agent examples |
|---|---|---|
| Episodic | What happened? | Past conversations, tool-call traces, "approach A failed last time, B worked" |
| Semantic | What's true? | "User prefers terse answers", "project uses Supabase", "deploys are us-east-1" |
| Procedural | How do I do it? | Learned workflows: the password-reset sequence, the standard debug routine |
A mature agent needs all three. Episodic-only over-indexes on anecdotes. Semantic-only never learns from experience. Procedural-only breaks on anything novel. Most builders start semantic (it's the easiest win - facts and preferences), add episodic when cross-session learning starts mattering, and find procedural memory emerging in things like Skills - which are essentially procedural memory you wrote by hand.
Where Memory Lives: Pick Your Layer
A 107-page 2025 survey (Hu et al., Memory in the Age of AI Agents) sorts storage into three forms. The practical conclusion is short:
Token-level - the layer you can actually use. Memory stored as readable text - facts, summaries, profiles - in files or databases, injected into the prompt when needed. Every production system you've heard of lives here: Mem0, Letta/MemGPT, Zep, Claude Code's CLAUDE.md. The reason is brutal and simple: commercial APIs accept text and nothing else. You can't reach into Claude's attention layers. Your entire design space is what goes in the prompt.
Within token-level there's a complexity ladder: flat (entries + vector search - Mem0's original design), graph (entities and relations for multi-hop queries - Zep's temporal knowledge graph), hierarchical (raw entries → cluster summaries → global abstractions - HippoRAG). Benchmark reality check: flat retrieval ties or beats fancier structures on standard tests. Start flat; graduate only when you observe actual multi-hop retrieval failures.
Implicit (KV-cache tricks, learned memory tokens) - requires model internals; commercial APIs don't expose them. Skip unless you self-host.
Parameter-level (fine-tuning knowledge in) - durable but can't update incrementally, and edits risk catastrophic forgetting of neighbors. Batch domain adaptation only; not a memory system.
The Lifecycle: Write, Maintain, Retrieve
Free AI Builder Newsletter
Weekly guides on AI tools & builder strategies.
Writing - quality ceiling gets set here
Garbage written = garbage retrieved, regardless of how clever retrieval is. Two workhorse methods:
- Extractive: an LLM pulls discrete facts from conversation - "user prefers dark theme", "API limit 100/min". Precise, occasionally misses context. Mem0's core pipeline.
- Summarative: compress conversations into running or chunked summaries. Holds context; risks semantic drift over repeated updates.
Use extraction for facts, summaries for conversational context. The bar for writing anything: does this constrain future reasoning? Preferences, decisions, recurring patterns - yes. Everything else - high threshold. A memory system that faithfully saves everything is a garbage heap with an API.
Maintaining - the step everyone skips
- Merge: "prefers concise answers" and "likes brief replies" are one memory, not two
- Update: "we migrated Postgres → Supabase" must supersede the old fact - Zep soft-deletes with timestamps so history survives but stale facts stop surfacing
- Forget: decay by age, prune by access, judge by importance - with one landmine:
WARNING
Frequency ≠ importance. The disaster-recovery runbook gets read once a year. Deleting it for low traffic is how you find out why it existed.
Retrieving - where good memories go to waste
- Query rewriting fixes the dirty secret of semantic search: questions and stored facts have different shapes. "How's auth handled?" sits far from "JWT validation in auth.ts middleware, 24h expiry" in embedding space. HyDE is the counterintuitive fix: have the model hallucinate an answer first, embed that, search with it. The fake answer is wrong but answer-shaped - and shape is what embedding distance measures.
- Hybrid search - BM25 keyword matching catches exact terms (
auth.ts); embeddings catch paraphrases ("login system" → "authentication middleware"). Either alone leaves recall on the table. - Filter hard. Three highly relevant memories injected beat ten half-relevant ones. Over-injection is just context pollution with extra steps.
Three Production Architectures
MemGPT / Letta - agent as memory manager. The 2023 OS metaphor: context window = RAM, external store = disk, and the agent itself pages data between them via tools (core_memory_append, archival search). Maximum flexibility; memory quality rides entirely on the model's judgment.
Mem0 - automated pipeline. Every exchange flows through extract-then-reconcile: LLM pulls facts, vector similarity decides new/update/merge. Later added a graph layer for relational queries. On the LOCOMO benchmark, Mem0 beat full-context stuffing - better accuracy, fewer tokens. Precise extraction + precise retrieval > brute force, officially measured.
Claude Code - files, that's it. CLAUDE.md at the project root (auto-loaded, stable conventions), ~/.claude/CLAUDE.md for cross-project preferences, plus topic-split memory files behind an index. Radical properties: fully human-readable, git-versionable, zero infrastructure, agent edits its own memory with file tools. Weakness: no semantic search - retrieval is file names and index discipline. At large memory volumes it strains.
| MemGPT/Letta | Mem0 | Claude Code | |
|---|---|---|---|
| Managed by | The agent | Pipeline | Agent + you |
| Storage | Tiered (core/recall/archival) | Vectors + graph | Markdown files |
| Transparency | Medium | Low | Total |
| Infrastructure | Framework | Vector DB | None |
| Fits | Complex memory reasoning | Products needing auto-memory | Dev tools, local agents |
The Minimum Viable Memory System
Three steps, one afternoon, no vector database:
- A
save_memorytool. The agent calls it when something deserves persistence. Writes to a file or a Supabase table - either works at this scale. - Index injection at session start. Put memory titles + one-line summaries in the system prompt; the agent requests full entries when relevant. This is Claude Code's index-then-read pattern, and it's load-bearing: inject summaries, not bodies.
- Scheduled hygiene. Merge duplicates, expire stale facts. A monthly manual pass is genuinely enough at the start.
This takes you from "amnesia every morning" to "remembers what matters." Add embeddings, HyDE, graphs when you hit a real retrieval failure - not before.
The One-Sentence Version
Memory ≠ context: context is RAM (visible now, gone at session end), memory is disk (persistent, useless until retrieved into context) - and the bridge between them, retrieval, is where memory systems are actually won. Build the smallest one that stops the amnesia, then earn each layer of complexity with a real failure. Done right, your agent gets better every week you use it. Skipped, every morning is a first date.
Continue Learning
Mastering AI Agents
The builder's deep dive into agent loops, tools, context engineering & memory — from using AI to building it.
AI Agent 101
Build autonomous research agents with tool use, API access, web scraping, and deep search.
Build Production LLM Apps
Build production-ready LLM apps with real-time APIs, multimodal models, and agentic workflows.
AI Builder Club
Courses, workshops, and a builder community for shipping with AI agents, Claude Code, and more.
Get the free newsletter
Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.
No spam. Unsubscribe anytime.