Skip to content
Vibe Coders

The Technical Debt AI Tools Create (And What to Do About It)

AI tools generate code that works today but is hard to modify tomorrow. The five categories of AI-specific tech debt we find most, and the refactor patterns that pay them down without a full rewrite.

By WitsCode11 min read

Every vibe-coded app we audit looks healthy for the first six weeks. The founder ships features faster than any agency could. Then something shifts. A change that used to take one prompt now takes four. A bug fixed in one place reappears in another. The AI starts producing code that no longer matches the rest of the codebase. The founder blames the model, or the framework, or themselves. The real answer is quieter and more structural. AI tools are exceptional at writing code and poor at reading it, and when a tool is poor at reading what already exists it cannot help adding to it in disciplined ways. The result is a specific flavour of technical debt that behaves differently from the classic kind. This article maps the five categories of AI-specific debt we find most often, explains why each one grows, and describes the refactor patterns that pay it down without a full rewrite.

Why AI technical debt behaves differently from the classic kind

Traditional technical debt is written by humans under deadline pressure. A developer knows the right shape of a solution, decides they do not have time to build it, and leaves a comment saying so. The debt is legible because the author understood the system. AI technical debt is different. It is written by a system that has no memory of the existing codebase beyond the files currently in its context window, no view of how a new change interacts with the rest of the app, and no incentive to delete or consolidate anything. Each prompt is a green-field build within a tiny window. Across hundreds of prompts, the codebase accumulates the same pattern repeated in slightly different shapes, state fragmented across components that each believe they are the first, and type signatures that promise everything and verify nothing. The debt is illegible precisely because no single author wrote it. No one chose to duplicate the pricing card. It just happened eight times.

This is why the classic advice around paying down technical debt does not work for vibe-coded apps. You cannot refactor your way out using only more of the tool that created the problem, at least not without a deliberate strategy. The tool will keep producing the same flavour of debt. What follows are the five categories we see the most, each one diagnosed and then paired with a pay-down pattern that works with AI rather than against it.

The copy-paste-same-component-eight-times pattern

Open the components folder of almost any Lovable or Bolt project that has been live for three months and you will find the following shape. There is a PricingCard. There is a PricingCardV2. There is a HomePricingCard that differs from PricingCard by a border radius. There is a LandingPricingCard that differs from HomePricingCard only by the copy inside it. There is a PricingCardMobile that is a near-duplicate of PricingCard with one class adjusted. There is a PricingCardNew that was going to replace all the others and did not. Each of these was produced by a prompt that sounded reasonable in isolation. The founder asked for a pricing section on the landing page. The tool, unaware of the existing pricing card or uncertain whether touching it would break another route, produced a new one. Six prompts later there are six pricing cards, each receiving slightly different fixes, and a price change requires remembering which five files to edit.

The pay-down pattern for this category is a consolidation sweep with a purpose-built prompt. Rather than asking the AI to "clean up the pricing components," which tends to produce a single rewrite that loses information from the other five, the effective move is to ask the AI to list every component in the repository that renders a price, then produce a single table mapping each visual variant to a prop on one consolidated component, then replace the call sites one route at a time with a commit between each. Claude Code, given tool access to the repo and a test that renders each route, can execute this sweep in under an hour on a typical Lovable project. Aider's git-backed edit mode is useful here because each commit is small and reversible, which matters because consolidation is the stage at which regressions hide most easily.

The state-scattered-across-components anti-pattern

The second category is harder to see because it does not produce duplicate files. It produces duplicate truths. A cart total lives as a useState inside the header because the header was the first place it needed to appear. Another useState lives inside the cart drawer because a later prompt added the drawer and did not know about the header. A third lives inside the checkout page. None of them agree after a quantity change on the product page, because the product page updates only the one it can see. The founder notices that the cart badge shows three items and the drawer shows two, files a bug, and the AI fixes the drawer by adding a useEffect that reads from localStorage, which introduces a fourth source of truth.

This pattern emerges because each AI prompt works on a narrow slice of the interface and reaches for the simplest state mechanism that solves the immediate view. Local state is always simplest in the moment. No prompt is ever large enough to say "and also retire the three other places this value is tracked." The pay-down pattern is to introduce a single store, usually Zustand or Jotai for React or the built-in stores in Svelte and Solid apps, and to perform a guided migration one value at a time. The guided part matters. An AI asked to "move state into a store" tends to produce a store shaped like the old scattered state, importing the same confusion into a new file. What works is naming the invariants first, which is a short document listing every value the app needs to agree on across views, and then asking the AI to build the store to those invariants before rewriting any call site. Claude Code with a short refactor brief and permission to run the build between steps handles this well. The visible proof that the pay-down worked is that a quantity change on the product page updates the header, the drawer, and the checkout total at the same time with no reload.

The no-boundary-types problem

TypeScript is installed in almost every vibe-coded project now because the starter templates include it by default. This is misleading. Installed TypeScript is not the same as enforced TypeScript. What we find is a project where tsconfig is on, the IDE is green, and every function boundary is typed as any, either explicitly or through a cast. A fetch wrapper returns any. A form handler receives any. A Supabase query is typed to its schema and then spread into an object typed as any two lines later. The effect is that autocomplete dies at the first boundary, refactoring tools cannot follow a rename, and the safety net TypeScript was installed to provide is absent from exactly the places it would catch real bugs.

This debt grows because AI tools optimise for the code compiling rather than for the types being useful. When a function is hard to type, the tool reaches for any, because any always compiles. The pay-down pattern is a staged tightening of the boundaries, starting from the edges of the app inward. The effective sequence is to turn on noImplicitAny and strictNullChecks, run the typechecker, take the first twenty errors, and fix them one file at a time with the AI doing the mechanical work and the human confirming the chosen type is the correct contract rather than a weaker one. A good rule for the human reviewer is that if the AI proposes a union type larger than three members at a boundary, the type is probably wrong and the shape of the function should change. After the first twenty errors, the next forty take half the time, because the early fixes propagate through inference. A typical Lovable project reaches a clean strict build in one or two focused sessions.

The untested-code debt that makes every other debt permanent

The fourth category is the one that turns all the others from inconvenient to structural. Most vibe-coded apps have no tests. Not a few. None. The founder never wrote any, and the AI tool, when asked to add a feature, produced the feature and not the test around it. This becomes the compounding cost behind every other debt category, because without tests there is no safe way to refactor the duplicate pricing cards, no safe way to migrate scattered state into a store, no safe way to tighten the types. Every pay-down move becomes a wager. The founder starts declining to refactor not because they do not see the debt but because they cannot see what would break.

The pay-down pattern for this one is the most important in the article because it unlocks the others. The move is to write tests first, and to use the AI to write them, but to write them in a very specific order that matches how the app earns revenue. The order is checkout, then authentication, then any workflow that writes to the database, then the routes that marketing campaigns send traffic to. An end-to-end test written with Playwright, driven by the AI against the running local app, takes roughly ten minutes per flow and covers the behaviour that actually matters to the business. The AI is good at this because end-to-end tests describe user behaviour, which is the kind of task AI models write well. Once the revenue flows are under test, the founder has permission to refactor. Claude Code's refactor-with-tests-first mode, which runs the test, makes the change, runs the test again, and rolls back on failure, is the single most effective pay-down tool we use, because it turns the AI from the author of the debt into the mechanic paying it down under supervision.

The dead-code archipelago

The final category is the quiet one. Every vibe-coded project accumulates islands of abandoned code. An old route that was replaced by a newer route and never deleted. A feature flag that was turned off six weeks ago and whose surrounding code never removed. An import of a library that was swapped for a different library and left dangling. A utility file that was referenced once by a component that no longer exists. None of these cause visible bugs. All of them slow the app down in subtle ways. They enlarge the bundle, confuse the AI when it reads the codebase for context, and make search results noisier for the founder. AI tools almost never delete on their own, because deletion feels dangerous when the tool cannot be sure what depends on the thing being deleted.

The pay-down pattern is mechanical and, once the tests are in place, low-risk. Tools like knip for TypeScript projects or ts-prune for the narrower case will produce a list of unused exports, unused files, and unused dependencies. The AI's job is to work through the list one item at a time, deleting each, running the test suite, and committing on green. The human's job is to review the commits and veto any deletion that the tool was uncertain about. A typical three-month-old Lovable project contains between ten and thirty percent dead code. Removing it tends to shave noticeable seconds off the build and, more importantly, shrinks the surface area the AI has to reason about on the next prompt, which raises the quality of every future change.

The pay-down-with-AI strategy in one sentence

If you take one thing from this article, take this. The tool that produced the debt is also the most efficient tool to pay it down, but only if it is running in a mode that forces it to write tests first, refactor second, and verify against those tests third. Everything else is either avoidance or rewrite. The small-step Claude Code loop of test, change, run, commit is the shape of pay-down that actually works, because it converts the AI from an author that accumulates debt into an agent that discharges it under a contract the human wrote. This is also why a pay-down cannot be done in one giant prompt. The prompt has to be small enough that the test can verify the change, which means the debt has to be itemised first before it can be paid down at all.

What a WitsCode AI tech-debt audit looks like

When a founder comes to us with an app that has started to feel slow in terms of change velocity, we run a fixed-scope audit. We clone the repo, run the tooling that produces the debt ledger across all five categories, and hand back a document that lists every duplicate component by file and prop, every scattered state value by view, every any-typed boundary by call site, every critical flow without a test, and every dead code island. Each line on the ledger has a cost estimate in hours and a pay-down order, because some debts block others and should be discharged first. The founder then either works through the ledger themselves using the prompts we supply, or hands the ledger to us and we clear it over one or two focused weeks with tests written first for every change. The ledger is the part that matters. Without it, the debt is invisible, and invisible debt is the kind that compounds.

There is a threshold at which pay-down stops making sense. If the app has no tests, more than fifteen duplicate variants of the same core component, state scattered across more than twenty views, and a typecheck that produces more than a thousand errors with strict mode on, the pay-down path is still cheaper than a rewrite but only barely. Above that threshold, the calculus changes, and a targeted rewrite of the domain model, reusing the UI components, is often the better move. Most vibe-coded apps we see are well below that threshold at month three and approaching it at month nine. The best time to pay down is at month three, when the debt is real enough to be worth naming and small enough to clear in a week. The worst time is the day before a launch, which is, unhelpfully, also the day most founders first realise the debt exists.

The work of turning an AI-generated app into a codebase a team can keep shipping from is not glamorous. It is the work of itemising the debt, writing tests that make refactoring safe, and running the small-step loop until each category on the ledger reaches zero. Done once, the app is ready for a team. Done never, the app is ready for a rewrite. The middle path, where the founder alternates feature work with weekly pay-down sessions, is the one we see produce apps that keep growing past the eighteen-month mark.

Get weekly field notes.

Practical writing on shipping products, straight to your inbox. No spam.

Need help with this?

MVP Development

We design and build web apps, MVPs, and SaaS products. Talk to us about what you are working on.

Talk to us

Want to discuss vibe coders for your business?

Start a project and we'll talk through where you are, what's working, and the highest-leverage moves for the next 90 days.