Skip to content
Vibe Coders

The Cursor Context Rot Problem and How We Solve It on Client Projects

Long Cursor sessions degrade because the context window fills with stale, contradictory, and hallucinated information. Here is the fresh-shell rule, the project-guide file discipline, and the...

By WitsCode10 min read

Somewhere around turn forty of a productive Cursor session, something quiet goes wrong. The agent starts citing rules you never set, editing files whose paths do not match the real repository, and confidently undoing a refactor it made eight turns ago. The diffs still compile. The tests often still pass. But the model is no longer reasoning about your code so much as reasoning about a compressed, lossy memory of your conversation, and the further you push the chat the more of that memory is noise rather than signal. This is context rot, and if you have been vibe coding with Cursor for more than a week you have almost certainly felt it, even if you have not had a name for it.

Context rot is not a Cursor bug and it is not something a bigger model will fix. It is a measurable property of every frontier language model currently shipping, documented by Chroma's July 2025 research across eighteen models including Claude Opus 4, the Sonnet line, GPT-4.1, and Gemini 2.5. Every one of them got worse at reasoning as the input grew, even when task complexity was held constant. Cognition's internal data on coding agents shows the same curve: agent success rates drop measurably after about thirty-five minutes of continuous operation, and doubling session duration quadruples the failure rate. The rot compounds. Because the symptoms look like the model being confused rather than the context being poisoned, most vibe coders treat it as a prompting problem and spend another hour arguing with a session that was already lost.

On client projects we treat long-session rot as a workflow problem with a workflow answer. This article walks through how the problem presents, why Cursor's context window behaves the way it does in 2026, and the three disciplines we install on every engagement: the fresh-shell rule, the project-guide file as single source of truth, and the summarise-and-restart pattern.

What Cursor actually fits into 200K tokens, and why the advertised number is misleading

Cursor's marketing pages quote a 200K token context window when the underlying model is Claude Sonnet 4.6, and technically that number is correct. Practically, it describes the ceiling of what the model can accept, not the budget you actually get to use for your code and your conversation. Inside a normal Agent mode session, a significant slice of that window is consumed before your first message is even processed. Cursor's system prompt, the tool schemas for every file operation and search operation the agent can call, the workspace indexing metadata, and the rolling chat history each take their cut. What is left for your turn-by-turn reasoning is closer to forty or sixty thousand usable tokens in standard mode, depending on the size of the workspace and the number of tools enabled. Agent mode against GPT-4.1 sits around eighty thousand. Regular non-agent chat truncates aggressively to something more like ten or fifteen thousand tokens even on a 200K model, which is why quick-ask conversations rarely hit rot but long agentic sessions always do.

Max mode raises the tool call cap from twenty-five to two hundred and removes Cursor's internal truncation, giving access to the full underlying window. It costs more and, critically, does not solve rot. It only lets rot accumulate longer before the session physically cannot continue. A 200K window filled with contradictory chat history is a worse reasoning environment than a 60K window filled with clean context.

The number to pay attention to is not the advertised window but the effective signal ratio inside it. Every turn where the agent explores a file, runs a search, writes code you reject, or argues about an approach adds tokens whose usefulness decays rapidly. Three turns later those tokens are still in the prefix, still being attended to, still influencing the model's probability distribution. They do not disappear until the chat ends.

The specific rot signals a Cursor user sees, and how to catch them early

Theoretical degradation curves are useful for understanding why rot happens. They are useless for spotting it in the moment, so this is the symptom list we train clients to watch for.

The first and most diagnostic signal is the phrase "as I mentioned earlier" applied to something the agent did not, in fact, mention earlier. The model has compressed its rolling history, lost the actual content of an earlier turn, and is now confabulating a reference. If you scroll back and cannot find the claim being cited, the session is rotting and no amount of restating your requirements will repair it. The agent has stopped reading the chat and started hallucinating it.

The second signal is file-path hallucination. The agent proposes to edit src/components/Button.tsx when the path in your repository is src/ui/button.tsx. Usually this happens because an earlier turn discussed a naming refactor that you rejected, but the rejection got lossy-compressed while the proposed names survived. The model is now editing an imaginary project that is a blend of your real one and the one it almost talked you into. Checkpoint-committing after every accepted change makes this catastrophic rather than merely annoying, because when you realise it has happened you can reset hard to the last good commit and start a fresh chat without losing work.

The third signal is contradictory edits inside a single session. The agent refactors a function one way at turn twenty, then refactors it the opposite way at turn forty-five, each time speaking as if the previous version was obviously wrong. Both turns are confident. Neither turn remembers the other. This is pure rot and it is the strongest possible reason to end the chat.

The fourth signal is rule drift. Your .cursorrules file, or the pinned instruction at the top of the chat, or the CLAUDE.md you loaded at the start, all stop being respected somewhere around turn forty to sixty. The agent starts writing class components in a server-component codebase, or introducing default exports in a project that banned them, or reaching for lodash in a repository that does not depend on lodash. The rules are still physically in the context window. The model is no longer attending to them because too many more recent tokens are competing for its attention.

The fifth signal is stale scaffolding, where the agent re-creates a helper deleted three turns ago because the deletion got summarised out of the rolling history while the creation stayed. Phantom imports, duplicate utilities, dead code that reappears after removal.

Any one of these is a warning. Two in the same chat means the session is over. The mistake we see vibe coders make is to argue harder, write longer instructions, or try to "remind" the agent what the rules are. Reminders do not evict rot from the prefix. They add more tokens to it.

The fresh-shell rule: one major task per chat, and restart on signal

The fresh-shell rule is the simplest discipline to install and the one that produces the biggest quality improvement on client projects. It has two parts.

The first part is a hard commitment to one major task per chat. When the task you are working on changes, open a new chat. "I finished the feature and now I want to fix a bug" is a task change. "I finished the bug fix and now I want to refactor" is a task change. "I finished the refactor and now I want to add tests" is a task change. Each of those transitions, done inside the same session, poisons the subsequent work with accumulated context from a task that no longer matters. Open a new chat, drop in the project-guide file, state the new task, and go. The tokens you think you are saving by continuing are tokens you are paying for with rot.

The second part is restart on signal. When you see any of the rot symptoms above, the correct response is not to continue the chat. It is to end it, summarise what was learned, and restart. This feels wasteful the first few times you do it, because the chat still contains good work and you are about to throw away the conversational scaffold that produced it. The reframe that makes this rule sustainable is that you are not throwing away the scaffold, you are throwing away the rot. The good work is already committed to your repository. The decisions are already in your summarise-and-restart note. What remains in the chat is just the confused middle-of-context, and deleting it is a feature.

Clients who internalise this rule report, within about a week, that their average session length drops from ninety minutes to about twenty-five and output quality rises. The thirty-five-minute Cognition threshold is not a coincidence. It is roughly where rot becomes detectable in most workflows, and twenty-five-minute chats stay comfortably inside the clean regime.

The project-guide file as single source of truth, not scattered memory

The deepest rot mitigation is architectural rather than behavioural. Scattered rules across chat history, sticky notes in the .cursorrules file, pinned messages, and personal memory all rot because they live inside the session. A project-guide file loaded on every turn does not rot, because it is reinjected cleanly whenever the chat is reset. The industry has converged on a few naming conventions for this file: CLAUDE.md from the Anthropic-aligned world, AGENTS.md as an emerging cross-tool standard, .cursorrules as Cursor's native form, and PROJECT.md as a neutral option. Pick one, make it authoritative, and delete the others. A codebase with three competing guide files is a codebase where the agent picks whichever one it wants on any given turn, which is effectively no guide at all.

What belongs in this file is tighter than most teams realise. The contents should be the stack and versions (Next 15, Tailwind 4, shadcn, Drizzle, whatever), the directory conventions, the commands to run tests and build and lint, the coding rules that are easy to break by default (server components are default, no default exports, accessibility requirements, forbidden dependencies), and the named forbidden patterns. That is roughly it. What does not belong is feature history, past decisions, prose philosophy, onboarding narrative, or a tutorial. Those belong in pull request descriptions, architectural decision records, or a wiki. A guide file that grows past about two hundred lines starts rotting its own context window, because it competes for attention with the code the agent is supposed to be reasoning about.

On client projects the guide file is treated as executable code rather than documentation. It is reviewed in pull requests, kept short, updated when a rule changes rather than appended to, and sits at the repository root so any agent (Cursor, Claude Code, Copilot) can find it. When a vibe coder complains the agent is ignoring a rule, the first question is not "did you tell the agent" but "is the rule in the guide file". Rules stated in chat rot. Rules stated in the file do not.

The summarise-and-restart pattern that preserves momentum across chat boundaries

The fresh-shell rule only works if ending a chat is cheap. If every restart feels like losing your place in a book, you will avoid restarting even when the session is clearly rotten. The summarise-and-restart pattern makes restarts trivial.

Before closing a productive chat, the last message you send is a request for a handoff note. The prompt is something like: produce a concise summary of what changed in this session, what decisions were made and why, what is still in progress, and what the next step is. Format it so it can be pasted directly into a new chat as the opening message. The agent is good at this because summarisation is a task language models are well-calibrated for and because the information the summary needs to capture is the freshest part of the context window, which is the part least affected by rot.

You paste that summary into a new chat, immediately after the guide file reference, and the new session starts with a clean window containing exactly the information it needs and none of the conversational debris that accumulated in the old one. The new chat is faster per turn because the prefix is shorter, more accurate because the signal-to-noise ratio is higher, and cheaper because the model processes fewer tokens. The handoff note itself rarely exceeds three or four hundred words, and everything that mattered from the old chat is either in that note or in your git history.

One refinement on larger engagements is a running session-log file in the repository. Every handoff note gets appended with a timestamp, giving the team a durable record of what each session produced. This is different from the guide file: the guide contains rules, the session log contains history. The agent does not read the log on every turn. Humans read it during standups and retros.

What the stack looks like when rot mitigation is installed correctly

A project with all three disciplines in place feels different from one without, most visibly on a Friday afternoon when someone is tired. The guide file sits at the repository root, under version control, reviewed, short, and authoritative. Every Cursor chat opens with the same pattern: a one-line reference to the guide file, a statement of the single task, and, if continuing, the handoff note from the previous session. Sessions run twenty to forty minutes. Rot symptoms trigger immediate restarts. "That session is rotten, summarise and restart" becomes a normal sentence rather than a ceremony.

From the outside this looks like a team that ships faster and argues with its tools less. From the inside it is a team that has stopped paying the rot tax. Same agent, same model, same window. The only thing that changed is discipline around what goes into it and when the slate gets wiped.

If you are running a Shopify storefront, a SaaS frontend, or an internal tool on Cursor and your team is hitting a wall around turn forty of every session, the fix is almost never a different model. It is workflow. WitsCode installs this stack as a standard part of our AI-workflow setup engagement: we write the guide file, encode the fresh-shell rule into the team's playbook, and audit the rot patterns in your current Cursor usage so that the next session your team runs starts clean and ends before it rots.

Get weekly field notes.

Practical writing on shipping products, straight to your inbox. No spam.

Need help with this?

MVP Development

We design and build web apps, MVPs, and SaaS products. Talk to us about what you are working on.

Talk to us

Want to discuss vibe coders for your business?

Start a project and we'll talk through where you are, what's working, and the highest-leverage moves for the next 90 days.