The Three-Hour AI-Generated Code Audit We Do Before Production

What a fixed-scope AI code audit actually looks like: the five categories we check, what gets patched in the PR, what we flag for later, and what you pay for.

By WitsCodeMarch 4, 202610 min read

Vibe Coders

Guide to ai code audit service and best practices for implementation — Photo by Daniil Komov on Unsplash

Most audit pitches you read on the internet are written by firms that want to sell you a twenty thousand dollar engagement with a scoping call and a statement of work and a kickoff deck. That is the right product for a Series B fintech with a compliance obligation. It is not the right product for a founder who built something useful in Lovable or Bolt last week and wants to know, before Monday, whether the thing is safe to turn on in front of paying customers. This article is about the other product. It is a fixed-scope, fixed-price, three-hour audit aimed at exactly that founder, and it exists because the gap between a DIY checklist and a five-figure engagement was a chasm that nobody was filling for vibe coders.

We want to be transparent about what the audit actually contains, what lands in your repo at the end of it, what we deliberately do not do, and why three hours is the specific number we chose. Most of the value in buying an audit is knowing in advance what you are buying, and the absence of that clarity is why founders often end up paying for expensive hourly work that drifts.

Why Three Hours Is the Right Box

Three hours is long enough to be useful and short enough to be a commodity. Two hours is enough time to read the code and write a list, which is a review, not an audit. A full day becomes a rewrite, which is a different product we also offer but which requires a different kind of conversation. Three hours sits in the narrow band where an experienced engineer can run a targeted sweep across the five categories that cover nearly all of the production-blocking bugs the current generation of AI coding tools ship, file a pull request with the critical items already patched, and write up everything that was not patched in a format the founder can act on.

The second reason three hours works is that it maps to a single focused block on a calendar. The engineer opens your repo, runs the sweep, writes the patches, records the walkthrough, and closes the laptop. There is no drift, no scope creep, no second meeting to decide whether to include something. The box is the box.

The third reason is pricing honesty. Three hours of senior engineering time has a floor cost, and pricing the audit below that floor would require cutting corners that defeat the purpose. Pricing above it would invite scope creep. Fixing the box at three hours fixes the price, and fixing the price removes the adversarial hourly dynamic that founders have learned to distrust. You know what you are paying and we know what we are delivering before either of us starts.

The Five Categories We Check

The audit runs against five categories, in this order, because each one depends on the layer beneath it and finding a problem early often moots later findings. We do not deviate from the order because the order is the discipline that keeps a three-hour sweep honest.

The first category is authentication. We verify that every route the UI gates is also gated on the server, because AI-generated apps almost universally ship client-only checks that a logged-in user can bypass with a single curl command. We check how sessions are stored, because a JWT in localStorage turns any XSS bug into a full account takeover, and the models default to localStorage roughly four times out of five. We walk the password reset flow to confirm tokens expire, are single-use, and are not echoed in response bodies. We confirm email verification is enforced before any paid action. If role-based access exists, we verify the role check runs on the server on every protected request, not just once at login.

The second category is data. We open the database schema and read every row-level security policy, because enabling RLS without writing per-operation policies is the most common false-positive security posture in the Supabase ecosystem. A table with read and write traffic needs policies for SELECT, INSERT, UPDATE and DELETE, each with a clause, and the models reliably ship only the SELECT policy and call it protected. We check that user_id ownership is enforced on writes with a WITH CHECK clause rather than trusting the client to send the correct id. We look for any raw SQL path, any string-concatenated query, any unsafe Prisma escape hatch, and we check foreign keys and cascade behaviour on the tables that hold user-owned data.

The third category is secrets. We walk the git history for committed .env files, scan the client bundle for any variable that should not be public, and verify that NEXT_PUBLIC_ or equivalent framework prefixes have not been used to front-load secrets onto keys that get shipped to the browser. We check that webhook signing secrets are present and verified, that Supabase service role keys are server-only, and we cross-reference any key we find against the provider dashboard to see whether it has been rotated since the repo went public. If your repo was ever public on GitHub, even briefly, any key that was in it at that moment is compromised and needs rotation regardless of what the code does now.

The fourth category is third-party integrations. Every external service the app talks to is a seam where trust has to be explicit. We verify Stripe webhooks check the signature header against the endpoint secret before doing anything with the payload, because an unverified webhook is a free path for anyone on the internet to mark any order as paid. We check OAuth redirect URIs are whitelisted and not taking user-controlled input. We look at any API integration that receives a callback or a response that drives business logic, and confirm the response is verified rather than trusted by origin. Many AI-generated Stripe integrations in particular are missing signature verification entirely, because the tutorial the model drew from was either outdated or simplified for brevity.

The fifth category is observability. This one is not strictly security, but it is production readiness, and it belongs in the same sweep because an app that breaks silently in production is a business risk equivalent to an app that is quietly breached. We check that a basic error reporter like Sentry is wired in, that it scrubs PII in the beforeSend hook rather than logging email addresses and tokens, that request IDs propagate across service boundaries so you can actually trace a customer complaint, and that some form of uptime check exists. We do not set up a full observability stack in three hours; we confirm the minimum is present and flag what is not.

What Gets Fixed Immediately

The audit is not a report with a list of things for you to do. The audit is a pull request with the critical fixes already applied, reviewed in a walkthrough, and ready to merge. The scope of what gets patched during the session is deliberate.

Anything rated Critical gets fixed inline. That includes authentication bypasses, secrets present in the client bundle, missing RLS on tables holding user-owned data, unsigned webhook endpoints, cross-site scripting paths on rendered user content, and any SQL injection surface. These are the findings that can be actively exploited the moment the app is public, and patching them is non-negotiable because the audit would not be honest otherwise.

High-severity findings get patched when the fix is contained, meaning under roughly thirty lines of diff and not requiring a schema migration or a product decision. A missing CSRF check on a state-changing route is a High that usually patches in one file. A missing password reset token expiry is a High that patches in two. If the fix is local and reversible, it ships with the PR.

Everything else gets written up. That includes Mediums, Lows, Info, any High that requires a schema migration or a product call, and anything we flagged but did not have time to verify exhaustively. The line we draw is that the PR must be safely mergeable into your main branch on the same day, which means every patch in it has to be conservative, scoped, and easy to revert. We do not touch your product logic. We do not refactor for taste. We fix the layer under the product and hand it back.

What Gets Deferred and Why

Being specific about what we do not do inside the three hours is as important as being specific about what we do. A thorough penetration test is outside the scope and always will be, because it is a different discipline that requires a different engagement shape and pricing model. A load test, likewise, is separate and depends on production traffic assumptions we do not have access to during an audit. Compliance certification against SOC 2 or HIPAA or PCI is a months-long process, not a three-hour product, and any firm that claims to ship it in an afternoon is either lying or sloppy.

Inside the code itself, we defer anything that requires your product judgment. If we find that deleting a user leaves orphaned records, we flag it as a Medium in the report, because the fix depends on whether you want hard delete, soft delete, or deletion with anonymisation, and that is a decision for you, not for us. Dependency upgrades that touch a major version, or that require running the test suite to confirm safety, are flagged rather than executed. Rate limiting strategy, log retention windows, and alerting thresholds all require input from you that we will not assume. The report explains each deferral in a line or two so that you can revisit them on your own schedule.

The Deliverable: What Lands in Your Repo

At the end of the three hours, four artefacts exist. The first is a branch on your repo named with the audit date, and a pull request open against your main branch. The commits inside the PR are grouped by category, each commit is atomic, and every commit message explains what the change does and which finding it addresses. You can review the PR commit by commit, ask questions on individual lines, and merge or revert at your own pace.

The second is a markdown file called AUDIT.md committed at the root of the branch. The report is organised by category, every finding has a severity rating, a short description of the problem, a specific reference to the file and line where the issue lives, the fix we applied if we applied one, or the fix we recommend if we did not. Findings you can search, findings you can link to in your own issue tracker, findings you can hand to a future engineer if the audit surfaces work you want done later.

The third is a Loom recording, usually between twelve and fifteen minutes, walking through the PR commit by commit and the report category by category. The video exists because reading a security PR cold is harder than watching someone explain it, and the walkthrough compresses the handoff. You watch it once, at your own speed, and the questions that remain go into the follow-up call.

The fourth is a scheduled follow-up call, free, held within a week of delivery, where you can ask anything about the report or the PR. The call exists because security findings have a half-life of understanding, and having a human available to answer questions a few days later reduces the chance that you merge something you did not actually understand.

What You Pay For and What You Do Not

The final thing worth being explicit about is the shape of the value exchange. You are paying for a specific, senior engineer's attention for three focused hours, the patched pull request, the written report, the walkthrough video, and the follow-up call. That is the whole deliverable and that is the whole invoice. There is no kickoff fee, no per-finding charge, no retainer attached to the audit, and no upsell baked into the delivery.

You are not paying for a penetration test, a compliance audit, ongoing monitoring, or remediation of the items we deferred. Those are separate engagements at separate prices, and we will quote them honestly if you want them, but they are never bundled into the three-hour product because bundling is where trust breaks down. The audit is the audit. What you do with the report is up to you, and if you want us to do the deferred work afterwards, you can book that as a second engagement with a clean scope and a clean price.

This clarity is the product. The category list, the fixed scope, the patched PR, the severity-rated report, the honest deferral list, and the flat price are what distinguish a three-hour audit from a consultant drifting through your codebase on an hourly clock. You should know exactly what you are buying before you buy it, and after reading this, you do.

If you want the audit run on your repo, you can book a slot at the WitsCode 3-hour audit page. ->

Get weekly field notes.

Practical writing on shipping products, straight to your inbox. No spam.

Need help with this?

MVP Development

We design and build web apps, MVPs, and SaaS products. Talk to us about what you are working on.

Talk to us

Want to discuss vibe coders for your business?

Start a project and we'll talk through where you are, what's working, and the highest-leverage moves for the next 90 days.

Start a project

The Three-Hour AI-Generated Code Audit We Do Before Production

Why Three Hours Is the Right Box

The Five Categories We Check

What Gets Fixed Immediately

What Gets Deferred and Why

The Deliverable: What Lands in Your Repo

What You Pay For and What You Do Not

Get weekly field notes.

MVP Development

Want to discuss vibe coders for your business?

MVP Development

SaaS Development

When to Hire a Developer vs When to Keep Vibe Coding

Vibe Coding Plus Agency Retainer: The Model That Actually Works

The Technical Debt AI Tools Create (And What to Do About It)

Why Three Hours Is the Right Box

The Five Categories We Check

What Gets Fixed Immediately

What Gets Deferred and Why

The Deliverable: What Lands in Your Repo

What You Pay For and What You Do Not

Get weekly field notes.

MVP Development

Want to discuss vibe coders for your business?

Need help with this?

MVP Development

SaaS Development

Keep reading

When to Hire a Developer vs When to Keep Vibe Coding

Vibe Coding Plus Agency Retainer: The Model That Actually Works

The Technical Debt AI Tools Create (And What to Do About It)