When an Agent Should Have Memory and When It Shouldn't
When a founder first watches an AI agent forget who they are between sessions, the reaction is almost always the same. They ask whether the agent can remember. The vendors say yes. The frameworks say...
When a founder first watches an AI agent forget who they are between sessions, the reaction is almost always the same. They ask whether the agent can remember. The vendors say yes. The frameworks say yes. The demos look magical. And so memory gets switched on, usually without a single conversation about what it should remember, who should see it, how long it should live, and what it is going to cost. Six weeks later one of three things has happened. The agent is bleeding tokens because every call drags a growing pile of history into context. A customer has seen something from another customer's session. Or the team has started treating the memory store as the place where real business facts live, and now nobody can tell whether a number came from the database or from something the agent thought it heard last Tuesday.
This article is about none of the heroic parts of memory. It is about the decisions that determine whether memory helps your product or slowly breaks it. If you are a non technical founder choosing between a bot that forgets everything and an agent that remembers too much, the frame below is the one to use.
What founders actually mean when they say memory
When a vendor says an agent has memory, they almost never mean one thing. They usually mean some combination of four. There is the short term working memory that holds the current conversation, which most people would just call the chat history. There is session state, which is the scratchpad the agent uses to track progress through a multi step task. There is long term memory, which is the persistent store that survives across conversations and usually lives in a vector database or a key value store. And there is semantic memory about the world, which is really retrieval over your documents and is better described as RAG than memory.
The confusion matters because each of these has different privacy, cost, and correctness properties. A support agent with chat history is not controversial. A support agent with a persistent long term memory that writes notes about every customer it has ever spoken to is a very different product, governed by very different rules. When the rest of this article says memory, it means the persistent kind, the one that outlives a single conversation and gets read back into later prompts.
The three scopes every memory design has to choose between
Every persistent memory write has to answer one question. Who is this memory for. There are three honest answers and you must pick one per entry.
The first is memory per user. The entry is keyed by a stable user identifier, almost always a user_id from your own system, and is only ever read back when that same user_id is acting. This is the scope for preferences, goals the user has told the agent about, tone they prefer, and names of their past projects. Done correctly this is the scope most personal assistant style features need. Done incorrectly, by keying on an email address or a cookie or a display name that can be reused, this is how two people end up sharing one memory.
The second is memory per session. The entry lives for the length of a single conversation or task and is discarded after. This is the right scope for a half filled form the user is walking through, for the agent's internal reasoning about the current request, and for a rolling summary of a long chat. It is cheap, it cannot leak across users, and it does not accumulate. Most support chats and one off workflows should use nothing more than this.
The third is shared memory. One pool that every user's invocation can read from and in some designs write to. Founders reach for this when they hear the phrase team knowledge or company memory. It is the scope that causes almost every privacy incident in this space. The moment a shared memory store can be written to by any user session, the agent will at some point write a fact from user A that later surfaces in a response to user B. Shared memory is only safe when it is read only from the user side and writes are controlled by your team, which is to say, when it is not really memory but a curated knowledge base.
The test is simple. If a memory entry being shown to the wrong person would be a story you would rather not be in, that entry must be scoped to a user_id. If it is genuinely public to your tenant, it can be shared, but you should probably call it a knowledge base and manage it as content.
When memory earns its keep
Memory is worth the trouble when the next useful answer genuinely depends on something the user said in a previous session and that thing is not already stored in your database. A coaching agent that remembers the client's stated goal from two weeks ago so it can tie today's advice back to it. A writing assistant that learns a user prefers short sentences and no headings. A personal planner that knows which projects the user is juggling without being reminded every morning. In each case the remembered item is small, stable, specific to that user, and not the kind of thing that belongs in a form field.
Memory also earns its keep inside a single long task, as session scoped state. An agent walking through a ten step onboarding flow needs a scratchpad. An agent researching a report across many tool calls needs a place to jot interim findings. This kind of memory is almost always safe because it dies when the task ends.
The common thread is that the remembered fact is a hint for better behaviour, not a record anyone is going to rely on. The moment the business starts relying on it, you are in different territory.
When memory quietly breaks things
Memory goes wrong in ways that are hard to see until they are expensive. The first failure is scope leakage. Someone on the engineering side forgets to pass a user_id on a write, or passes the wrong one, and a single memory entry now belongs to a user who never made it. In a multi tenant SaaS this can escalate from awkward to regulatory in a single support call. Every memory system on the market, whether that is the Claude memory tool, OpenAI's threads, LangGraph's store, Mem0, Zep, or a homegrown Postgres table, relies on the application layer to pass the right scope on every read and every write. If your team cannot guarantee that invariant, memory is not safe to turn on.
The second failure is hallucinated memory. The agent invents a fact about the user, writes it to memory as if it were true, and then reads it back later with full confidence. This is worse than a model hallucination in a single turn because it persists. If you let the agent write freely to long term memory, you need a separate process that reviews and prunes what it wrote. Agents that can write whatever they want to memory will, given enough turns, write things that are wrong.
The third failure is compliance drag. In regulated workflows, any persistent store of user statements becomes a record subject to retention rules, deletion requests, and audit. A startup that added memory in a weekend suddenly has a data subject access obligation it had not budgeted for. If your product touches health, finance, or minors, the default answer on long term memory is no, not yes, unless someone has signed off on the retention and deletion paths.
The cost you do not see until the invoice arrives
Memory lives in context. That is the only way the model can use it. Whatever your memory store does under the hood, at inference time the relevant entries are pulled and pasted into the prompt, and you pay input token rates for every single one, on every single call.
Do the arithmetic once and it changes how you design. Suppose you decide an agent should have access to roughly four thousand tokens of user memory. The user has a conversation that takes fifty model calls. That is two hundred thousand tokens of memory alone, ignoring the actual conversation. Multiply by a thousand users a day and you are paying for two hundred million tokens of memory recall daily before anyone has asked a useful question. On current frontier model pricing that is real money, and the memory contribution grows with every week the user uses the product.
There are three ways to soften this. Retrieve aggressively so that only the three or four most relevant memory entries enter any given prompt instead of the whole pile. Summarise old memory into shorter rolling summaries rather than keeping every raw statement. And structure your prompt so that stable content, including any shared knowledge, sits in the cached prefix while user specific memory sits after it, which at least lets prompt caching help on the non user portion. None of these make memory free. They make it affordable.
The trap founders fall into is discovering this at scale rather than at design time. The first hundred users cost nothing. The ten thousandth user is where the invoice starts to matter, and by then the memory layer is load bearing and hard to pull back.
Forgetting is a feature, not a bug
Any memory system without a forgetting strategy becomes a landfill. Useful facts get buried under stale ones, retrieval quality degrades because the relevant entry is competing with dozens of outdated ones, privacy exposure grows linearly with time, and costs climb with every new entry. The memory system that works long term is the one that forgets on purpose.
Three mechanisms together tend to do the job. A time to live on every entry, so that anything not refreshed within a defined window simply expires. Usage based decay, where entries that are never retrieved get down weighted and eventually dropped. And periodic summarisation, where a batch job rewrites a user's long tail of raw facts into a much shorter synthesis and discards the originals. The exact numbers depend on the product, but the instinct should be that most memory entries should not survive a year, and many should not survive a week. If your memory store is append only and nothing ever leaves, you do not have a memory system. You have an archive that is quietly becoming a liability.
Memory is not a database
This is the rule founders most often get wrong and the one that causes the most downstream pain. Memory is a hint layer, not a source of truth. Business records belong in your database.
If the agent needs to know the user's current subscription tier, that is a query against your billing system, not a memory lookup. If the agent needs the order number, the open ticket, the account balance, the feature flag, the contract end date, those are database reads. Memory is the wrong home for any of them, because memory is mutable by the agent, eventually consistent at best, and has no integrity guarantees. The agent might remember yesterday's tier after a downgrade. It might remember an order that was cancelled. It might remember a number it misread from a tool response.
The right shape is that the system of record answers authoritative questions and the memory layer answers flavour questions. What is the balance comes from Postgres. What does this user prefer to be called comes from memory. When this boundary is clear, the rest of the design falls into place. When it is blurred, the agent starts quoting numbers nobody can trace and the support team loses trust in the product.
A simple test before you turn memory on
Before you enable persistent memory on any agent, walk through three questions for each category of thing you want it to remember.
Does the next answer actually depend on something the user said in an earlier session, and not on something that already lives in your database. If the answer is no, you do not need long term memory, session memory is enough.
If this entry were shown to the wrong user, is that a story you would be comfortable with. If the answer is no, the entry must be scoped to a stable user_id, written through a code path that cannot forget to pass the scope, and read only through the same path.
Is this entry ever going to be used as the basis for a business decision, a quoted number, or anything the user will rely on. If the answer is yes, it does not belong in memory at all. It belongs in your database, with the schema, constraints, and audit that decision deserves.
Agents that pass all three questions before memory is switched on tend to stay boring in the good way. The memory helps where it should, the database holds what matters, costs grow predictably, and nobody ever opens a ticket that starts with the phrase the bot told me about another customer.
If you want help drawing the line between what your agent should remember, what your database should own, and how to scope and expire the rest without wrecking your token budget, WitsCode designs agent memory architecture for founders who want the behaviour without the blast radius. ->
Get weekly field notes.
Practical writing on shipping products, straight to your inbox. No spam.
Need help with this?
Custom Web Applications
We design and build web apps, MVPs, and SaaS products. Talk to us about what you are working on.
Talk to usWant to discuss non-tech founders for your business?
Start a project and we'll talk through where you are, what's working, and the highest-leverage moves for the next 90 days.