Skip to content
Vibe Coders

Claude Code for Maintenance: The Weekly Operations Workflow

Every Monday: dependency update, security patch, test run, log review, cost review. The Claude Code automation we run to keep client projects from rotting between feature work.

By WitsCode10 min read

Most vibe-coded projects die the same way. They do not die from a failed launch. They die six months after launch, when a Supabase row-level-security policy silently drifts, when three minor npm updates pile up into a breaking change, when a Sentry error count climbs a little every week and nobody notices because the founder is busy shipping the next feature. The project rots between feature work. By the time anyone investigates, the stack is two majors behind, the test suite has six unexplained skips, and the Vercel bill has tripled because a loop regression is pounding a Supabase edge function.

At WitsCode we run last-mile developer work across more than 250 client sites. The only reason we can keep that many projects healthy at once is that every Monday morning, a Claude Code routine runs against each repo. Five tasks, in the same order, with the same prompts, producing the same artefacts. It is boring on purpose. This article walks through exactly what that routine looks like, the Claude Code prompts we use for each task, and the one discipline that keeps the automation from becoming a new source of chaos.

Why Monday Matters for Vibe-Coded Projects

Vibe-coded projects have a different decay curve than traditional software. A Rails app from 2018 degrades slowly because its dependency surface is narrow and its deployment pipeline is conservative. A 2025 Next.js project scaffolded with a Claude Code one-shot has a much wider surface. It probably pulls in Shadcn, Drizzle or Prisma, a Supabase client, a payments SDK, a Resend or Postmark client, a Sentry SDK, a feature-flag provider, and twenty transitive packages under each of those. Every one of those can issue a security advisory or a breaking minor in a given week.

Monday morning is the right moment to inspect the damage because the previous week of user traffic has already generated the log patterns and cost signals you need. If you run maintenance on Friday afternoon, you are reacting to data that is about to be invalidated by the weekend. Running on Monday gives you one complete week of production behaviour to read against, and it puts any remediation work in front of a full workweek rather than a tired Friday deploy window.

The point of the routine is not to surface interesting findings. It is to confirm, most weeks, that nothing interesting happened. When something interesting does happen, you want a workflow that catches it in the first twenty-four hours of the week, not the last.

The Five-Task Monday Routine

The routine has five tasks in a fixed sequence. Dependency update first, because security patches often ride on dependency bumps and you want the tree current before you audit it. Security sweep second, to catch anything the dependency manager missed. Test run third, because both of the previous steps can invalidate tests and you want that signal before you touch logs. Log review fourth, because by now you know the code is green and you are looking purely at runtime signal. Cost review last, because cost anomalies are almost always explained by something you already found in the earlier steps.

Each task is a Claude Code invocation in headless mode, usually claude -p "..." with an allowed-tools list scoped to the minimum needed. Each task writes its output into a branch named maint/YYYY-MM-DD and opens a pull request. The PR is the artefact. We never, ever auto-merge these PRs. That discipline is the single thing that separates useful maintenance automation from an agent that quietly breaks production on a Tuesday.

The Five Tasks in Detail

Task One: Dependency Update with Triage

The first task is not running updates. It is triaging what Renovate or Dependabot has already opened. By Monday morning, Renovate has typically produced between five and forty PRs per repo, each bumping one package. A human cannot read forty PRs. Claude Code can.

The prompt we use is roughly this. Invoke from the repo root with read-only git access and the ability to run the package manager.

claude -p "Review all open pull requests labelled 'dependencies' in this repo.
For each one, classify as: safe-minor (lockfile/patch/minor, no changelog red flags),
review-needed (minor with notable changelog), or major-breaking (major version
or visible breaking change).
For safe-minor PRs, approve and merge if CI is green.
For review-needed, post a three-line summary of the changelog on the PR and add
the label 'human-review'.
For major-breaking, do not touch. Add the label 'human-major' and summarise the
migration steps in the PR body.
Produce a final report at maintenance/deps-$(date +%F).md listing what was merged,
what was flagged for review, and what was deferred."

The output is a single markdown file committed to the maintenance branch. The branch becomes a PR titled something like maint: deps triage 2026-04-20. Somebody on the team reads the report, checks the auto-merged PRs actually landed cleanly, and works through the review-needed list.

The trap to avoid is letting Claude Code run the package manager update itself unscoped. If it does npm update or pnpm up --latest it will generate a single megadiff that no human can meaningfully review. Constrain it to Renovate or Dependabot PRs so each change remains isolated and revertible.

Task Two: Security Patch Sweep

The second task catches what dependency triage misses. That means zero-day advisories that have not yet produced a Renovate PR, indirect dependencies that Renovate grouped into a parent PR, and repo-local security hygiene that no bot watches.

Our prompt here is focused on auditing rather than fixing. We want Claude Code to read the output of npm audit or pnpm audit, cross-reference against the GitHub Security Advisories API, and look at code patterns that are common sources of vulnerabilities in the vibe coder stack. That usually means exposed Supabase service-role keys, missing row-level-security policies on new tables, API routes without rate limiting, and client-side exposure of server secrets.

claude -p "Run pnpm audit --json and capture output. Cross-reference any high or
critical advisories against the affected files in this repo. For each confirmed
hit, determine whether it is reachable from production code paths.
Then scan for these repo-local security smells: Supabase service role keys used
outside /api or /server, new tables in supabase/migrations since the last
maintenance run that lack a row-level-security policy, API route handlers
without rate limiting, and any NEXT_PUBLIC_ env vars that look like secrets.
Output a prioritised list at maintenance/security-$(date +%F).md. Do not apply
fixes. Flag critical items in the PR title."

The reason we do not let Claude Code apply security fixes automatically is that security fixes frequently involve changing auth flows, rotating keys, or modifying RLS policies, and every one of those can take production down in a way that is hard to detect from a PR diff. A human has to own those changes. The automation exists to make sure no human ever has to discover them by accident.

Task Three: Test Run and Flaky Test Detection

The third task runs the full regression suite three times and diffs the results. Most maintenance failures come from tests that pass four times out of five and have been silently ignored for months. Running the suite a single time tells you nothing. Running it three times and comparing tells you which tests are flaky, which are newly broken, and which have started taking noticeably longer.

claude -p "Run 'pnpm test' three times sequentially. Capture pass/fail and
duration per test for each run.
Produce a report at maintenance/tests-$(date +%F).md with three sections:
- Newly failing: tests that failed in all three runs and passed in the previous
  maintenance report.
- Flaky: tests whose pass/fail state differed across the three runs.
- Slow drift: tests whose average duration increased more than 25% versus the
  previous maintenance report.
Do not modify any test files. If newly failing tests exist, add the label
'maint-critical' to the PR."

The previous maintenance report becomes a baseline. This is where the discipline of committing every Monday's output as a PR pays compound interest. Claude Code reads last week's report from the maintenance/ directory and has something to diff against. Without that historical record the flaky-test detection cannot work.

Task Four: Log Review Across Sentry and Axiom

Task four leaves the repository and inspects runtime behaviour. The two tools most of our clients end up on are Sentry for application errors and Axiom for structured logs from Vercel functions, Supabase, and any worker queues. Claude Code runs with API tokens scoped to read-only access on both.

What we want from this task is not a list of every error. Sentry already produces that. We want the differences from last week. New error signatures, errors whose frequency jumped more than a threshold, and errors that correlate with a specific deploy.

claude -p "Query Sentry for all issues in the last 7 days. Compare against the
report from the previous maintenance run.
Flag: new issue signatures, issues whose event count increased more than 3x
week-over-week, and issues whose first_seen falls within 2 hours of a Vercel
deploy.
Query Axiom for Vercel function errors and Supabase auth failures in the same
window. Correlate any spikes with the Sentry findings.
Output a report at maintenance/logs-$(date +%F).md. Prioritise items that
correlate with a specific deploy. Do not dedupe or resolve anything in Sentry."

The deploy-correlation step is the one that pays for itself. A 3x spike in an error that started twelve minutes after a Friday evening push is almost always a real regression, and it is exactly the kind of thing a founder shipping fast will not notice for a week.

Task Five: Cost Review of the Vibe Coder Stack

The last task is cost. Vercel, Supabase, Anthropic, and Resend are the four dashboards that cover most client spend. Each of them exposes either a CSV export or an API that Claude Code can hit with a scoped token. The question we want answered is always the same: what changed this week versus the rolling four-week average.

claude -p "Pull usage metrics for the past 7 days from Vercel (bandwidth,
function invocations, function duration), Supabase (database CPU, egress,
edge-function invocations), Anthropic (tokens in/out by model), and Resend
(emails sent).
Compare each metric to its rolling 4-week average.
Flag any metric that is more than 40% above average or that projects to exceed
the current billing tier's included allowance before month end.
For Anthropic spend specifically, break down by API key label so we can see
which product feature is driving tokens.
Output a report at maintenance/cost-$(date +%F).md with a one-paragraph
executive summary at the top."

The Anthropic breakdown matters because most runaway cost in AI-enabled vibe-coded products comes from a single feature where a prompt expanded unexpectedly or a retry loop misfired. Without per-feature attribution you cannot fix the cause, only scale the budget.

Scheduled Task Versus Manual Kickoff

There is a real trade-off in how the routine gets triggered, and it is worth being explicit about. The scheduled approach is a GitHub Actions cron workflow that runs at, say, 06:00 UTC every Monday, invokes Claude Code in each repo, and opens the five PRs before the operator starts their day. The manual approach is a single CLI command the operator runs when they sit down at the desk on Monday.

Scheduled is reliable. It runs when the operator is sick, on holiday, or distracted. But scheduled has a failure mode: if the operator does not show up, the PRs pile up unread, and by the third unread week the routine is worse than useless because it has created a backlog of stale maintenance branches that nobody trusts. The automation quietly becomes noise.

Manual is the opposite. It only runs when a human is at the keyboard, which means the output gets read. But it only runs when a human is at the keyboard, which means on any week the human is not there, the routine does not happen. For a solo maintainer this is fine. For an agency running 250+ sites it is not.

The answer we landed on is hybrid. The scheduled task runs every Monday, but it does not open PRs directly into the client repos. It opens a single Linear issue per client in a shared ops board, attaches the five reports, and assigns it to the on-duty engineer. The engineer opens the repo, reviews the reports, and either promotes them to PRs or closes the issue. This forces a human pass on every site every week, while guaranteeing that the automation runs regardless of who is in the office.

The Retain-As-PR Discipline

The single rule that separates maintenance automation that helps from maintenance automation that destroys is this. Every output is retained as a pull request or tracked issue, and every merge is made by a human. Claude Code never merges its own work to main. It never runs with --dangerously-skip-permissions against a client repo. It never pushes directly to a protected branch.

The reason is not that Claude Code is unsafe. It is that maintenance changes, by their nature, touch the surfaces of a system that the original author has forgotten. Dependency bumps expose behavioural drift in packages nobody has looked at for six months. Security patches touch auth, which is exactly the thing you cannot afford to break silently. Log-driven fixes change error handlers that were working well enough. None of these are good candidates for an agent to apply unilaterally. They are all excellent candidates for an agent to prepare and annotate so that a human review can take ten minutes instead of two hours.

The artefact that makes this discipline stick is the maintenance/ directory in every client repo. Every Monday's five reports get committed as dated markdown files. Over a year, that directory becomes the operational memory of the project. You can scroll back and see exactly when a specific error started appearing, when a dependency crossed into major-version risk, when the cost of a particular feature began to drift. No dashboard gives you that view because every dashboard resets its window. A directory of weekly reports does not.

How WitsCode Runs This for Clients

We run this exact routine, every Monday, across every project on a WitsCode maintenance retainer. The retainer covers the scheduled run, the human review pass, any safe-minor merges, and the triage of anything that lands in the review-needed or critical buckets. Real remediation work, beyond triage, bills against a small monthly pool of engineering hours that rolls over if unused.

The effect, from the client's side, is that the project simply does not rot. They ship features, we keep the floor clean. No quarterly panic when a major CVE drops. No surprise tripled Vercel bill. No test suite that nobody dares to touch because half the skips are a mystery. If you shipped a vibe-coded product and you are starting to feel the maintenance load, talk to us about the retainer. The Monday routine runs whether you are watching or not.

Get weekly field notes.

Practical writing on shipping products, straight to your inbox. No spam.

Need help with this?

MVP Development

We design and build web apps, MVPs, and SaaS products. Talk to us about what you are working on.

Talk to us

Want to discuss vibe coders for your business?

Start a project and we'll talk through where you are, what's working, and the highest-leverage moves for the next 90 days.