Dispatch and Delegation: Running Agents After-Hours Safely
Scheduled agents, computer-use agents, and dispatch agents. What is safe to run while you sleep and what is not, and the three safety guards every overnight agent needs before you trust it with your...
The fantasy is obvious. You go to bed, an agent cleans your inbox, drafts your follow ups, updates your CRM, queues your social posts, and greets you in the morning with a tidy list of what it did. The reality for most founders who actually try this is that they wake up to a two hundred dollar API bill, an email thread where the agent replied to a customer with a half finished sentence, and a Notion database that somehow lost a column. The agent was not malicious. It was simply unsupervised, and nothing in the setup was built to contain the cost of a bad decision made at three in the morning.
This piece is about running agents after hours without that class of disaster. You will see the split between scheduled agents and computer use agents and why the risk profile is different for each. You will see the three safety guards that every overnight agent needs, which are a cost cap with auto pause, an action allowlist that excludes destructive operations, and a morning report that summarises what actually happened. You will also see the rollback window pattern that most teams miss, which is the discipline of keeping every overnight action reversible for twenty four hours. By the end you will know exactly which jobs to let run while you sleep and which ones still need a human in the chair.
The two kinds of overnight agent and why they fail differently
A scheduled agent is the simpler animal. It wakes up at a fixed time, runs a defined task against an API or a structured data source, and goes back to sleep. Think of a nightly job that summarises yesterday's support tickets, drafts a daily sales digest from your CRM, or rewrites tomorrow's calendar into a priority list. The blast radius is narrow because the agent only touches the systems you explicitly wired it to touch, and usually through API calls that return structured responses. When a scheduled agent fails, it usually fails quietly, meaning it writes a bad summary or skips a step, and you catch it in the morning.
A computer use agent is a different category of risk. This is the class of agent that drives a browser or an operating system like a person would, clicking and typing its way through real interfaces. Claude's computer use capability, OpenAI's Operator, and the various open source browser agents all fall in this bucket. When this kind of agent runs overnight, it can log in to a dashboard you own, edit a live record, send a message, make a purchase, and close the tab, all without the schema level guarantees that an API gives you. The blast radius is whatever that agent can reach with a mouse, which is usually more than you think. Most failure modes here are not quiet. They are expensive, visible, and sometimes customer facing.
Dispatch agents sit in between. A dispatch agent is one that receives work from a queue, a webhook, or another agent, and then routes it to a specific handler. They are usually safer than computer use agents because the work they do is well scoped, but they can chain into other agents, which means a bad input at the top of the chain can fan out into many bad actions at the leaves. The safety rules that follow apply to all three, but the tightness of the controls should scale with the blast radius of the tool the agent is actually holding.
The first guard, a hard cost cap that auto pauses the agent
Every overnight agent needs a dollar ceiling that is enforced before the next API call goes out, not after the bill arrives. The pattern is simple. Before the agent begins its run, it records the starting cost for the current billing window. After every tool call or model response, it checks the running total against the cap. If the running total crosses the cap, the agent stops, writes a pause record to its log, and does not resume until a human explicitly restarts it in the morning.
The specific number matters less than the fact that it exists. For a scheduled summarisation agent, five dollars a night is usually wild overkill. For a computer use agent that might loop on a broken page, ten dollars is a sensible first cap. The mistake almost every founder makes is setting the cap at the platform billing dashboard, which evaluates usage in near real time but not in your loop. By the time the platform notices you spent sixty dollars in an hour, the agent has already spent it. The cap has to live inside the agent itself, which usually means a wrapper around your model client that counts tokens and multiplies by the posted price, and a short circuit in your tool loop that refuses to call another tool once the running total crosses the threshold.
The auto pause is the other half. An agent that hits its cap and keeps going is not capped. An agent that hits its cap and silently finishes its current loop but does not start the next one is capped. The pause record is the final ingredient, because it is what tells your morning self that the agent stopped on purpose and not because of a crash. Most overnight agents will never touch the cap, and the day one of them does is the day you find out whether your setup actually works.
The second guard, an action allowlist that excludes destructive operations
The second guard is a strict list of what the agent is allowed to do, expressed as tool names, API endpoints, or URL patterns rather than as prose instructions in a system prompt. System prompts are advice. Allowlists are law. An agent that has been told in its prompt to not delete anything will sometimes delete something anyway because a user message, a retrieved document, or a chained agent convinced it that deletion was the right move. An agent whose tool registry does not contain a delete function cannot delete anything, no matter what it was told.
The operations that belong on the block side of the list are almost always the same for a business. Deleting records, cancelling orders, issuing refunds over a small threshold, sending emails to anyone outside a defined domain, publishing to public channels, changing access controls, running migrations, and any action that touches money or identity. These are the actions that cost more to undo than they save by being automated. An overnight agent almost never needs to perform any of them to be useful. The moment you find yourself wanting to enable one, put that specific action behind a morning approval step instead.
The allowlist works best when it is enforced at two layers. The agent's tool registry only exposes the safe tools, and the underlying service account that the agent uses to reach those tools only has permissions for the safe actions. That way a prompt injection in a retrieved document that tries to get the agent to call a delete function fails at the tool layer, and a cleverer injection that goes around the tool registry fails at the permission layer. Defence in depth is the cheapest form of peace of mind for an overnight run.
The third guard, a morning report that summarises overnight activity
The third guard is the discipline of producing a single readable document at the end of every overnight run that lists what the agent did, what it skipped, what it flagged, and what it cost. The morning report is not a log file. Log files are for debugging. The report is for trust. It should fit on one screen, be written in plain sentences, and be the first thing you read with your coffee.
A good report has four sections. First, a one line headline that says whether the run completed, paused, or failed, and the total cost. Second, a list of actions taken, grouped by type, with counts and links to the affected records. Third, a list of things the agent considered but did not do, with the reason, which is the section that catches mistakes before they become patterns. Fourth, a list of items flagged for human attention, which is where the agent admits it was not sure. The fourth section is the one that earns the agent more autonomy over time. An agent that flags the right things gets trusted with more. An agent that flags nothing is either perfect, which is unlikely, or not paying attention, which is far more common.
The report is also your audit trail when something goes wrong two weeks later. A customer will eventually email to ask why they received a message at four in the morning that sounded off. Without a report you will be reconstructing events from raw logs. With a report you open that night's entry, see the action the agent took, follow the link to the underlying reasoning, and either apologise with specifics or confirm that nothing actually went out. Either way you answer the customer in five minutes instead of fifty.
The rollback window, or why every overnight action should be reversible for twenty four hours
The guard that almost no SERP article discusses is the rollback window. The rule is that every action an overnight agent takes should be reversible by a single human command for at least twenty four hours after it happens. This is not a backup policy. It is a design constraint that shapes what the agent is allowed to do in the first place.
In practice it means a few concrete things. Emails the agent drafts go to a held queue and release after a delay, or they send immediately but a suppression and recall path is defined before the agent is switched on. Records the agent creates are tagged with a run identifier so that a single query can undo the whole run. Records the agent edits are written with a soft update pattern that preserves the previous value for the rollback window. Payments, publications, and outbound messages that cannot be recalled are simply not on the allowlist, which is why the second guard and the rollback window reinforce each other.
The window matters because the morning report is the moment you discover most problems, and you need the power to fix them quickly. A bot that made fifty bad CRM edits overnight is a non event if a single button reverses all fifty. The same bot without a rollback window is a ten hour cleanup job. The cost of designing for reversibility is small at the start and enormous once the agent is live, which is the exact shape of every engineering decision you wish you had made earlier.
Which jobs are actually safe to run while you sleep
Putting the guards together gives you a clear sorting function. A job is safe to run overnight if it touches structured APIs rather than live interfaces, if every action it performs is on the allowlist, if the running cost can be capped inside the agent, if the outcome is summarised in a morning report, and if everything it does is reversible for a day. Summarising yesterday's support tickets passes. Drafting tomorrow's sales follow ups into a held queue passes. Generating a daily competitor watch digest passes. Updating CRM fields with a soft update and a run tag passes.
Jobs that fail the test almost always involve a computer use agent doing something that cannot be undone. Making purchases, sending cold outreach to new addresses, publishing to public channels, responding to live customer threads, running database migrations, and triggering paid ads all belong in the daytime, with a human in the loop. The question to ask is not whether the agent is capable of the task, because modern agents are capable of almost everything. The question is whether the worst plausible version of that task at three in the morning is something you can reverse before breakfast. If the honest answer is no, that job is not an overnight job yet.
Start with one overnight job and earn the rest
The last piece of advice is to resist the temptation to roll out five overnight agents at once. Pick one job, the smallest useful one, and run it nightly for two weeks with all three guards and the rollback window in place. Read the morning report every day. Tune the allowlist when the report shows the agent wanted to do something it could not. Raise the cost cap only when the data says the cap is the real limit. After two weeks of clean runs the agent has earned the right to take on a second job, and the pattern you built for the first one drops straight onto the second.
Most founders who successfully delegate to overnight agents ended up with a handful of small, well bounded, well instrumented jobs rather than one large autonomous agent. The small jobs are boring to describe and reliable in practice, which is exactly the profile of work you want happening while you sleep.
→ If you want help wiring up an overnight agent with the three guards, a rollback window, and a morning report you will actually read, the WitsCode overnight agent safety engagement sets this up end to end for your specific stack so you can delegate with confidence instead of with hope.
Get weekly field notes.
Practical writing on shipping products, straight to your inbox. No spam.
Need help with this?
Custom Web Applications
We design and build web apps, MVPs, and SaaS products. Talk to us about what you are working on.
Talk to usWant to discuss non-tech founders for your business?
Start a project and we'll talk through where you are, what's working, and the highest-leverage moves for the next 90 days.