Language models are stateless. Each request starts fresh; the model's weights don't update. Agentic memory is the engineered layer that fakes persistence on top of this.
The patterns
- Conversation buffer. Keep the last N turns in the context window. Cheap, limited by context size.
- Summary buffer. As the conversation grows, summarise older turns and keep the summary. Compresses, loses detail.
- Vector memory. Embed key facts (user preferences, decisions made, names) into a vector store. Retrieve relevant ones for each new query.
- Structured memory. Maintain a key-value store ("user prefers dark mode," "project deadline is 2026-06-01"). The agent reads and writes explicitly.
ChatGPT's "memory" feature, Claude's projects, and Cursor's codebase context all use combinations of these.
Where it breaks
- Stale memory. The model writes "the user is happy with X" → user changes mind → next session reads stale fact, acts on it.
- Retrieval misses. The user's important detail doesn't get surfaced because the query embedding didn't match.
- Compounding errors. Bad early memory entries pollute everything later.
What helps
Explicit "forget this" flows. Versioned memory with TTLs. Letting the user see and edit what's stored. Treating memory as a hypothesis to verify, not a fact to assume.
What to read next
Agents are the systems memory exists to power. RAG is the retrieval mechanism most memory systems are built on. Context windows is the cheaper, simpler alternative for short-horizon work.