The AI-Generated Code Audit Checklist (13 Things We Check)

The thirteen checks we run on every AI-generated codebase before it ships, and why the defaults Lovable, Bolt and v0 produce fail most of them.

By WitsCodeFebruary 12, 202611 min read

Vibe Coders

Guide to ai generated code audit and best practices for implementation — Photo by Igor Omilaev on Unsplash

There is a particular kind of relief that comes with finishing an app in Lovable or Bolt. The preview works, the demo video looks clean, Stripe is wired up, and you can almost taste the launch tweet. Then somebody sensible in your life asks whether the code is safe, and you realise you have no idea what would happen if a moderately curious stranger pointed a terminal at it. That feeling is the correct feeling. It is the reason you are reading this.

This article is the checklist we run on every AI-generated codebase that crosses our desk, ordered the way we actually run it. Thirteen items, four domains, and for each one the specific shape of the bug the current generation of models tends to ship. Most audit articles on the internet tell you to use a linter and call it a day. That is not an audit. An audit is a targeted search for the patterns these tools produce by default, and the majority of those patterns sit in the blind spots that linters never look at. Read it end to end before your next release, or book the fixed-scope version of it with us and skip the reading.

Authentication Is Where Most AI Code Quietly Fails

The single most common class of bug in AI-generated apps is authentication that looks correct and is not. We start every audit here because it is where the largest holes hide, and because if this layer is broken, nothing downstream of it matters.

The first check is whether authentication is enforced on the server or only on the client. Open the admin panel, or the billing page, or whichever route is gated in the UI, and search the codebase for how that gate is implemented. If the only check is a React conditional of the form if (user.role === 'admin') deciding whether to render the page, the gate is cosmetic. The API route behind that page is the real door, and until that route verifies the session cookie and re-checks the role on every request, any logged-in user can call it directly with curl and read whatever it returns. This is the most common serious vulnerability we find, and the models produce it at a rate approaching one in two projects. The fix is to add server-side middleware that reads the session, loads the user, and rejects the request with a 403 before the handler runs. Not to remove the client check. Both layers exist for a reason, but only the server layer is security.

The second check is session handling. Tokens should live in HTTP-only cookies with the Secure flag and SameSite set to Lax or Strict, and they should expire. If your project is storing a JWT in localStorage, a single cross-site scripting bug anywhere in the app becomes a full account takeover, because any script that runs on your domain can read that token and replay it. This pattern is endemic to AI-generated code because the models treat localStorage as the path of least resistance and never revisit the decision. The remediation is to move the token into a cookie, set the flags, and verify that your API reads it from the cookie rather than an Authorization header.

The third check is the password reset and email verification flow. We have lost count of the number of AI-generated resets where the token never expires, where the same token can be used twice, or where the reset endpoint returns the new password in the JSON response for debug purposes and nobody removed the debug. Reset tokens must be single-use, time-limited to roughly fifteen minutes, invalidated after successful reset, and the endpoint must never echo secrets back to the client. Verifying this takes ten minutes and prevents a class of attack that requires no skill to execute.

Data Access Has to Enforce Itself

Once auth is solid, the next layer is whether your database will defend itself if the application layer fails. This is where Supabase and Postgres RLS policies live, and it is another area where the defaults the models produce look like security and are not.

The fourth check is the completeness of your row-level security policies. Enabling RLS on a table is step one. The actual work is writing a policy for each operation the table supports. A table that your users can read, insert into, update, and delete from needs four policies, one per operation. The common AI default is a single policy of the form USING (auth.uid() = user_id), which covers SELECT only. INSERT and UPDATE are silently allowed for any authenticated user until you add explicit WITH CHECK clauses, which means any user can insert rows claiming to belong to another user, or update the user_id column on an existing row and reassign ownership. Open the Supabase dashboard, list every table, and verify each one has policies for every operation it supports. Do not accept the policy list the model generated at face value. Read each policy and check the USING and WITH CHECK clauses separately.

The fifth check is whether every query is parameterised. Any string concatenation that produces SQL is an injection path, and it does not matter whether the string came from a user, a URL, an environment variable, or a field the user can edit three hops upstream. Search the codebase for $queryRawUnsafe, for template literals inside a pg.query call, and for any function that builds a WHERE clause from a variable. Replace every one with the parameterised form. Models default to string concatenation whenever the ORM feels in their way, and they do it most often in search endpoints, admin-only filters, and reporting queries. Those are exactly the endpoints a determined attacker will probe.

Inputs, Outputs, and Forms

The third domain covers the boundaries where user-controlled data enters and leaves your system. Three checks here, and they are the ones traditional SAST tools partially catch, which means they are the ones you are most likely to skip because you assume something else is watching. Something else is not watching. These are AI-generated codebases.

The sixth check is input validation at the API boundary, enforced by a schema library. Every route handler that accepts a request body should run that body through a Zod or Yup schema before touching it. The schema defines the shape, the types, the length limits, and the allowed values, and it rejects anything that does not match. The model's usual shortcut is to trust the shape of the request because the frontend submitted it, which is a reasonable assumption right up until the first person writes their own frontend. Frontend validation is user experience. Backend validation is the only validation that counts. Add schemas to every route, and treat the route without a schema as a route not yet finished.

The seventh check is output escaping, which means hunting down every path where user-supplied content reaches the browser without being passed through a safe renderer. The obvious offender is dangerouslySetInnerHTML with any value derived from user input, which is a direct XSS. Less obvious are markdown renderers used without a sanitizer, raw innerHTML assignments inside useEffect hooks, and link or image tags where the href or src attribute is built from a user string and can be set to javascript: or data: URIs. Pick a markdown library that sanitises by default, wrap user-generated HTML in DOMPurify, and never pass a raw user string into an attribute that can execute code.

The eighth check is CSRF protection on state-changing routes. Next.js apps with cookie-based sessions are vulnerable to cross-site request forgery unless something is defending the POST, PUT, PATCH, and DELETE endpoints. The standard defences are a CSRF token bound to the session, or a strict origin check that rejects any request whose Origin or Referer header does not match your own domain. Models routinely skip this because the SPA feels like an API to them, so they assume bearer tokens are in use and CSRF does not apply. If the session is in a cookie, CSRF applies. Implement one of the two defences and confirm it runs on every mutating route.

Infrastructure Is the Unglamorous Half

The final domain is the operational layer, and it is where the audit spends about a third of its time, because these checks take longer to verify and the failures are harder to notice from a working app. Five items here, and they are the difference between a project that survives its first week of real traffic and one that does not.

The ninth check is secrets management. Grep the repository for anything that looks like a key, including sk_live, service_role, AIza, ghp_, and the literal strings api_key, secret, and password. Check the git history, not just the current tree, because a key committed and then removed is still a key anyone can see. Verify that .env is in .gitignore and that no committed file has ever contained production credentials. Then look at the Next.js config and audit every environment variable prefixed NEXT_PUBLIC_. That prefix ships the value to the browser bundle, and models occasionally use it for a value that should never have left the server, most commonly the Supabase service role key. Any service role key in the client bundle is a full database compromise waiting to happen.

The tenth check is rate limiting, and specifically whether it is keyed on the right identifier. AI-generated rate limiters almost always key on IP address, which is the wrong default for two reasons. One, a motivated attacker rotates IPs trivially using any residential proxy service, so the limit does nothing to stop them. Two, a thousand legitimate users behind a corporate NAT or a mobile carrier share one IP and hit the limit as a group, so the limit does stop them. The correct pattern is to key authenticated routes on the user ID, and key the unauthenticated routes that create accounts or trigger password resets on IP only as a crude secondary defence. Review the rate limit config in your middleware, check the key function, and confirm each route uses the right one.

The eleventh check is CORS, which is either fine or catastrophic with nothing in between. Find the CORS configuration, which is usually in middleware or a header helper. Confirm that Access-Control-Allow-Origin is set to an explicit list of your own domains, never to *. If credentials are allowed, the origin must be explicit by specification, and browsers will reject any wildcard combined with credentials anyway. If the model hand-rolled CORS, it usually got the production origin wrong or added a permissive dev rule and forgot to remove it before launch. Read the config in production, not in the preview, and test it with a curl request from a disallowed origin to confirm it rejects.

The twelfth check is error handling with PII scrubbing in whatever observability tool you picked. Sentry out of the box captures the full request body, query string, and breadcrumb trail of user interactions, which means the first password reset your users perform gets their new password logged to your Sentry project in plain text. Configure the beforeSend hook to scrub email, tokens, password fields, and any custom fields you know contain sensitive data. If you do not have error logging at all, add it before launch, because debugging a production incident without logs is an argument for picking a different career. If you do have it, confirm the scrub is active in production, not just in the config file.

The thirteenth check is dependency hygiene. Run npm audit and read the high and critical findings, not just the count. Turn on Dependabot or Renovate in the repository settings. AI models ship with the versions they saw during training, which can be eighteen months stale by the time you launch, and stale dependencies accumulate CVEs the way an unkempt garden accumulates weeds. The twenty minutes spent bumping versions and retesting is the cheapest security work you will ever do, and it eliminates entire categories of vulnerability without you needing to understand what they were.

One note on running these checks yourself before we move on. The order matters. Authentication first, data next, inputs third, infrastructure last. We run it in that order because a failure in an earlier layer makes later layers moot. There is no point tightening rate limits on an endpoint that anyone can already call because the auth check is client-side, and there is no point hardening CSRF on a form that writes to a table without RLS policies. Each layer is load-bearing for the next one. If you only have time for four items this week, do the first four on this list and come back for the rest.

A second note on tooling. None of these checks is fully automated by any linter or SAST product we have used. Semgrep catches some of the string-concat SQL cases. Snyk catches the stale dependencies. Sentry surfaces the missing error handling. Between them you get maybe four of the thirteen items. The other nine require a human reading the code with the specific AI-generated failure patterns in mind, which is why the audit exists as a human engagement rather than a product. Tooling is useful as a first pass, not as the whole pass.

How We Run This as a Service

We built the fixed-scope WitsCode audit because every one of these thirteen items has the same shape. It is a specific check, it takes a known amount of time, and the model-generated version of the code fails it in one of a small number of predictable ways. The audit is three hours of billable time, flat priced, and the deliverable is a ranked list of findings from critical to informational, along with a pull request that patches the top three by default. You get the PR whether you hire us for the rest or not. Most clients take the PR and fix the long tail themselves from the list, which is the point. We would rather you understand your codebase than depend on us to understand it for you.

If you have read this far and recognised two or more of the defaults we described, your codebase is in the modal population, and the modal outcome is that one of these items is already exploitable in production. The fix is a single afternoon of focused work. Not fixing it is the kind of decision you remember for the wrong reasons. Book the audit ->, or start at the top of this article with a terminal open and work your way down. Either path is better than shipping and hoping nobody notices.

Get weekly field notes.

Practical writing on shipping products, straight to your inbox. No spam.

Need help with this?

MVP Development

We design and build web apps, MVPs, and SaaS products. Talk to us about what you are working on.

Talk to us

Want to discuss vibe coders for your business?

Start a project and we'll talk through where you are, what's working, and the highest-leverage moves for the next 90 days.

Start a project

The AI-Generated Code Audit Checklist (13 Things We Check)

Authentication Is Where Most AI Code Quietly Fails

Data Access Has to Enforce Itself

Inputs, Outputs, and Forms

Infrastructure Is the Unglamorous Half

How We Run This as a Service

Get weekly field notes.

MVP Development

Want to discuss vibe coders for your business?

MVP Development

SaaS Development

When to Hire a Developer vs When to Keep Vibe Coding

Vibe Coding Plus Agency Retainer: The Model That Actually Works

The Technical Debt AI Tools Create (And What to Do About It)

Authentication Is Where Most AI Code Quietly Fails

Data Access Has to Enforce Itself

Inputs, Outputs, and Forms

Infrastructure Is the Unglamorous Half

How We Run This as a Service

Get weekly field notes.

MVP Development

Want to discuss vibe coders for your business?

Need help with this?

MVP Development

SaaS Development

Keep reading

When to Hire a Developer vs When to Keep Vibe Coding

Vibe Coding Plus Agency Retainer: The Model That Actually Works

The Technical Debt AI Tools Create (And What to Do About It)