#ai-agents#memory#llm#tutorial#vector-database

Agent Memory Systems: The Complete Guide (2026)

LLMs forget everything between sessions. The 3 memory types, the write-maintain-retrieve lifecycle, and MemGPT vs Mem0 vs Claude Code compared.

ShirleyJune 11, 20266 min read

Course outline · Build AI Agents (1.6)

You spent 30 minutes teaching your agent the project: TypeScript, Vitest, Supabase, deploys on Vercel. It nailed the work. You closed the terminal. Tomorrow it knows nothing.

This is not a bug - it's the architecture. An LLM's only "memory" is the context window: the text you send with each request. New session, empty window, total amnesia. The in-conversation memory you experience is just prior messages being re-sent every turn.

For one-shot questions, fine. For an AI that works with you across weeks - remembering your preferences, your stack, last Tuesday's debugging conclusion - you need a memory system bolted on from outside. Here's the full map: the three memory types, where memories physically live, the lifecycle that keeps them useful, and how MemGPT, Mem0, and Claude Code each solved it differently.

Prefer the video version? Watch the full breakdown of whether agent memory is actually solved:

Three Kinds of Memory (Borrowed From Your Brain)

Cognitive science gave agent design its most useful taxonomy. Three types, three jobs:

Type	Answers	Agent examples
Episodic	What happened?	Past conversations, tool-call traces, "approach A failed last time, B worked"
Semantic	What's true?	"User prefers terse answers", "project uses Supabase", "deploys are us-east-1"
Procedural	How do I do it?	Learned workflows: the password-reset sequence, the standard debug routine

A mature agent needs all three. Episodic-only over-indexes on anecdotes. Semantic-only never learns from experience. Procedural-only breaks on anything novel. Most builders start semantic (it's the easiest win - facts and preferences), add episodic when cross-session learning starts mattering, and find procedural memory emerging in things like Skills - which are essentially procedural memory you wrote by hand.

Where Memory Lives: Pick Your Layer

A 107-page 2025 survey (Hu et al., Memory in the Age of AI Agents) sorts storage into three forms. The practical conclusion is short:

Token-level - the layer you can actually use. Memory stored as readable text - facts, summaries, profiles - in files or databases, injected into the prompt when needed. Every production system you've heard of lives here: Mem0, Letta/MemGPT, Zep, Claude Code's CLAUDE.md. The reason is brutal and simple: commercial APIs accept text and nothing else. You can't reach into Claude's attention layers. Your entire design space is what goes in the prompt.

Within token-level there's a complexity ladder: flat (entries + vector search - Mem0's original design), graph (entities and relations for multi-hop queries - Zep's temporal knowledge graph), hierarchical (raw entries → cluster summaries → global abstractions - HippoRAG). Benchmark reality check: flat retrieval ties or beats fancier structures on standard tests. Start flat; graduate only when you observe actual multi-hop retrieval failures.

Implicit (KV-cache tricks, learned memory tokens) - requires model internals; commercial APIs don't expose them. Skip unless you self-host.

Parameter-level (fine-tuning knowledge in) - durable but can't update incrementally, and edits risk catastrophic forgetting of neighbors. Batch domain adaptation only; not a memory system.

The Lifecycle: Write, Maintain, Retrieve

Free AI Builder Newsletter

Weekly guides on AI tools & builder strategies.

Writing - quality ceiling gets set here

Garbage written = garbage retrieved, regardless of how clever retrieval is. Two workhorse methods:

Extractive: an LLM pulls discrete facts from conversation - "user prefers dark theme", "API limit 100/min". Precise, occasionally misses context. Mem0's core pipeline.
Summarative: compress conversations into running or chunked summaries. Holds context; risks semantic drift over repeated updates.

Use extraction for facts, summaries for conversational context. The bar for writing anything: does this constrain future reasoning? Preferences, decisions, recurring patterns - yes. Everything else - high threshold. A memory system that faithfully saves everything is a garbage heap with an API.

Maintaining - the step everyone skips

Merge: "prefers concise answers" and "likes brief replies" are one memory, not two
Update: "we migrated Postgres → Supabase" must supersede the old fact - Zep soft-deletes with timestamps so history survives but stale facts stop surfacing
Forget: decay by age, prune by access, judge by importance - with one landmine:

WARNING

Frequency ≠ importance. The disaster-recovery runbook gets read once a year. Deleting it for low traffic is how you find out why it existed.

Retrieving - where good memories go to waste

Query rewriting fixes the dirty secret of semantic search: questions and stored facts have different shapes. "How's auth handled?" sits far from "JWT validation in auth.ts middleware, 24h expiry" in embedding space. HyDE is the counterintuitive fix: have the model hallucinate an answer first, embed that, search with it. The fake answer is wrong but answer-shaped - and shape is what embedding distance measures.
Hybrid search - BM25 keyword matching catches exact terms (auth.ts); embeddings catch paraphrases ("login system" → "authentication middleware"). Either alone leaves recall on the table.
Filter hard. Three highly relevant memories injected beat ten half-relevant ones. Over-injection is just context pollution with extra steps.

Three Production Architectures

MemGPT / Letta - agent as memory manager. The 2023 OS metaphor: context window = RAM, external store = disk, and the agent itself pages data between them via tools (core_memory_append, archival search). Maximum flexibility; memory quality rides entirely on the model's judgment.

Mem0 - automated pipeline. Every exchange flows through extract-then-reconcile: LLM pulls facts, vector similarity decides new/update/merge. Later added a graph layer for relational queries. On the LOCOMO benchmark, Mem0 beat full-context stuffing - better accuracy, fewer tokens. Precise extraction + precise retrieval > brute force, officially measured.

Claude Code - files, that's it. CLAUDE.md at the project root (auto-loaded, stable conventions), ~/.claude/CLAUDE.md for cross-project preferences, plus topic-split memory files behind an index. Radical properties: fully human-readable, git-versionable, zero infrastructure, agent edits its own memory with file tools. Weakness: no semantic search - retrieval is file names and index discipline. At large memory volumes it strains.

	MemGPT/Letta	Mem0	Claude Code
Managed by	The agent	Pipeline	Agent + you
Storage	Tiered (core/recall/archival)	Vectors + graph	Markdown files
Transparency	Medium	Low	Total
Infrastructure	Framework	Vector DB	None
Fits	Complex memory reasoning	Products needing auto-memory	Dev tools, local agents

The Minimum Viable Memory System

Three steps, one afternoon, no vector database:

A save_memory tool. The agent calls it when something deserves persistence. Writes to a file or a Supabase table - either works at this scale.
Index injection at session start. Put memory titles + one-line summaries in the system prompt; the agent requests full entries when relevant. This is Claude Code's index-then-read pattern, and it's load-bearing: inject summaries, not bodies.
Scheduled hygiene. Merge duplicates, expire stale facts. A monthly manual pass is genuinely enough at the start.

This takes you from "amnesia every morning" to "remembers what matters." Add embeddings, HyDE, graphs when you hit a real retrieval failure - not before.

The One-Sentence Version

Memory ≠ context: context is RAM (visible now, gone at session end), memory is disk (persistent, useless until retrieved into context) - and the bridge between them, retrieval, is where memory systems are actually won. Build the smallest one that stops the amnesia, then earn each layer of complexity with a real failure. Done right, your agent gets better every week you use it. Skipped, every morning is a first date.

Open source · free

AI Builder Club Skills

The memory setup in this article, plus Codebase Memory MCP, ships in our open-source Claude Code plugin. Run /setup-codebase-harness and it wires the map into every session.

View on GitHub →

Sources & Verification

This guide is written from hands-on testing, then cross-checked against primary sources - official documentation and first-party announcements. Field results and opinions are labeled as such. See our editorial standards.

MemGPT: Towards LLMs as Operating Systems (arXiv 2310.08560) - The paper behind tiered/virtual-context agent memory
mem0ai/mem0 (GitHub) - Open-source memory layer for agents, referenced in the Mem0 comparison

Join AI Builder Club

✓65+ lessons, 22+ workshops

✓350+ plug-and-play prompts & skills

✓Weekly live builder workshop

✓Premium tools (e.g. 10xCoder, AI tutor)

✓AI Builder Pack ($5,000+ in exclusive AI credits & perks)

1k+

Join 1,000+ builders already inside

Start shipping →30-day money-back · Cancel anytime

$37/mo

Live workshop

Get the free newsletter

Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.

No spam. Unsubscribe anytime.

Continue Learning

Mastering AI Agents

The builder's deep dive into agent loops, tools, context engineering & memory — from using AI to building it.

AI Agent 101

Build autonomous research agents with tool use, API access, web scraping, and deep search.

Build Production LLM Apps

Build production-ready LLM apps with real-time APIs, multimodal models, and agentic workflows.

← Back to Blog