Skip to content
Vibe Coders

The Sub-Agents Pattern for Long-Running Claude Code Tasks

Breaking a large task into sub-agents that each have narrow context. The pattern we use for 8-hour refactors, data migrations, and cross-repo changes. Includes actual prompt templates.

By WitsCode10 min read

Most advice about Claude Code assumes the task fits in one agent run. Ask a question, get a diff, move on. That model works until the task runs past roughly forty minutes, at which point the single-agent approach starts to break in specific and predictable ways. The context window fills with file reads and tool output, early decisions get paraphrased and degraded in compaction, the agent re-reads files it has already seen, and the quality of the work quietly drops without the agent or the operator noticing. Sub-agents are the pattern that fixes this, and once you have run a few long tasks with them the single-agent workflow starts to feel like writing code without functions. This article covers what sub-agents actually are, why context isolation is the load-bearing property, when to dispatch in parallel and when to chain, the prompt template we use on every dispatch, and the anti-patterns that quietly produce worse results than no sub-agents at all.

What a sub-agent actually is

A sub-agent in Claude Code is a full, separate agent instance that the parent spawns through the Task tool. It has its own context window, its own tool set, and its own conversation history, and it knows nothing about the parent except the prompt the parent wrote for it. When it finishes, it returns a single text report. The parent sees that report and nothing else. No file reads, no tool output, no intermediate reasoning, none of it lands in the parent's context.

This is different from a helper function, a chained prompt, or a "step two" instruction inside the same agent. Those all share a context window with whatever came before. A sub-agent does not. It is more like calling a subprocess than calling a method. The subprocess runs, it prints to stdout, and you read what it printed.

That single property, context isolation, is the reason sub-agents work on long tasks. Everything else in this article is a consequence of it.

Why context-window preservation is the load-bearing benefit

Consider what happens to a single-agent run on an eight-hour refactor. The agent plans, reads the repository, makes a list of files, starts editing. Every file read, every grep, every test run, every bash command appears in the context window. By hour two the window is at fifty percent. By hour four it is at eighty and the runtime triggers compaction, which summarises the earlier conversation into a paragraph. That paragraph is always worse than the original. Decisions the agent made carefully at hour one are now rendered as "we decided to use the new ORM" with no record of why, and the next time a similar choice comes up the agent re-decides it, often differently.

Sub-agents avoid this by moving the bulk of the reading and editing out of the parent's window entirely. The parent reads the plan, dispatches a sub-agent to do the work, and receives a report. The report is a few hundred words. The eighty files the sub-agent read, the forty tool calls it made, the failed test it debugged, none of it lands in the parent. The parent's context stays small across the whole run, which means the parent's reasoning stays sharp. On our internal runs, the difference between compacting at hour three and never compacting at all is the difference between needing to re-explain the goal and not.

There is a second-order benefit worth naming. Because the parent only sees the report, the sub-agent's mistakes are easier to catch. A single-agent run buries errors inside a wall of tool output that the operator scrolls past. A sub-agent report is a discrete artefact you can read in thirty seconds, and if the report is vague or contradicts what you asked for, you catch it immediately.

When to dispatch in parallel and when to chain

The second structural benefit of sub-agents is that the parent can dispatch several of them in a single message, and the runtime executes them concurrently. A well-structured refactor that touches four independent modules can run four sub-agents in parallel and finish in the time of the slowest one, rather than the sum of all four.

The test for parallelisability is whether any sub-agent needs the output of another. If the database schema change must land before the API code is written against it, those two tasks are sequential and you dispatch them one after the other. If the billing module and the notifications module each need their own test suite updated and neither depends on the other, those are parallel and you dispatch them in the same message.

The subtle case is feature work where the contract is still being decided. If the API sub-agent is still figuring out what the response shape will be when the UI sub-agent starts consuming it, the UI sub-agent will guess wrong and redo its work. The rule we follow is that parallel dispatch requires a written contract at the boundary. If the contract exists, parallelise. If the contract is what the work is deciding, serialise and let the first sub-agent's report become an input to the second sub-agent's prompt.

In practice this means the parent agent often does a small amount of serial work up front to freeze contracts, then fans out in parallel, then collects the reports and does any integration work that needs to reason across all the outputs. That shape, serial kickoff, parallel middle, serial finish, is the most common productive structure for a multi-hour run. It maps cleanly onto how a human tech lead would run the same work across a team of four engineers, and for the same reason: the expensive resource is the coordination context, and the cheap resource is the independent execution. Sub-agents make independent execution genuinely cheap, which shifts the economics toward doing more of it in parallel than a single-agent run ever allows.

The prompt template we use on every dispatch

Every sub-agent dispatch at WitsCode follows the same five-part template. The structure matters because the sub-agent has no other source of context, so anything you forget to include in the prompt is simply missing from its world.

ROLE
You are the sub-agent responsible for the billing module in the
monorepo at /repo. You do not modify any file outside billing/
or its test directory.

CONTEXT
The parent agent is migrating the codebase from Stripe's legacy
API to the 2024-06-20 pinned version. The migration plan lives at
docs/migrations/stripe-2024.md. Read that file first. The parent
has already updated shared/stripe-client.ts; do not change it.

TASK
Update every file under billing/ to use the new client's
subscription.retrieve signature, which now requires an expand
parameter for line_items. Adjust callers accordingly.

EXIT CRITERIA
pnpm test billing/ passes with no skipped tests, and grep -r
"subscription.retrieve" billing/ shows no unmigrated call sites.

REPORT FORMAT
Return no more than 300 words covering: what you changed, any
call sites where the new expand parameter changed semantics in a
way the parent should review, and any tests you had to rewrite
rather than fix.

Every section has a job. The role statement fences the sub-agent's blast radius, which matters because a sub-agent that wanders outside its module is exactly the kind of failure that is hard to catch from a report. The context section is the minimum information the parent knows that the sub-agent needs, and writing it well is the skill that separates a clean dispatch from one that produces rework. The task is a verb-first description of the outcome. The exit criteria are verifiable by command, so the sub-agent knows when it is done without asking. The report format, and specifically the word limit, is what keeps the parent's context clean.

That last instruction is worth naming explicitly. Without a word limit, sub-agents narrate. They describe what they read, what they considered, what they rejected, and a three-line change comes back as a thousand-word essay. That essay lands in the parent's context and partially defeats the isolation benefit. With a word limit, the sub-agent compresses its report into decisions and outcomes, which is what the parent needs.

The four-agent pattern for a full feature

The canonical shape of a long-running task at WitsCode is a parent agent that reads the spec and dispatches four sub-agents: database, API, UI, and tests. The parent's job is to hold the plan and the contract between layers. Each sub-agent's job is to own its layer.

The parent prompt looks roughly like this:

You are the coordinator for implementing the "scheduled exports"
feature described in docs/specs/scheduled-exports.md. Read the
spec, produce a contract document at docs/contracts/scheduled-
exports.md that pins the table schema, the API surface, and the
UI state shape, then dispatch four sub-agents in parallel:

1. database sub-agent: create the migration and ORM models
2. api sub-agent: implement the endpoints against the contract
3. ui sub-agent: build the settings page against the contract
4. tests sub-agent: write the integration tests

Each sub-agent reads the contract document at start. After all
four report, review the reports, flag any cross-layer issues,
and produce a final summary.

The contract document is the key. Writing it is the parent's serial work, and it is what lets the four sub-agents run concurrently without stepping on each other. When the contract is good, the sub-agents come back with clean diffs. When the contract is vague, the sub-agents disagree about the shape of the same field and the parent spends the next hour reconciling.

The anti-pattern the docs do not cover

There is one failure mode that is not obvious until you have hit it a few times. It is delegating work that needs the parent's context. The sub-agent does not know what the user asked, does not know the constraints discovered in the earlier part of the conversation, does not know the half-finished reasoning the parent has been doing about an edge case. If the parent dispatches a sub-agent to handle that edge case without writing all of that context into the dispatch prompt, the sub-agent will do something reasonable and wrong.

The tell is that after the sub-agent reports, the parent reads the report and says "that is not quite what I meant." The fix is not to retry the dispatch with a longer prompt, it is to recognise that the task was not actually delegable. Work that lives in the ongoing conversation between the parent and the user belongs with the parent. Work that can be specified in writing, in a self-contained brief, belongs with a sub-agent. The test is whether you could hand the dispatch prompt to a new engineer who had never seen the project and expect a reasonable result. If yes, dispatch. If no, do the work in the parent.

A related anti-pattern is dispatching sub-agents for tasks smaller than the dispatch overhead. A sub-agent starts cold. It has to re-read the file, re-establish the structure, re-decide the approach, and write a report. For a three-file edit the parent finishes faster than the sub-agent can spin up. The rough threshold we use is ten minutes of expected work. Below that, do it in the parent. Above that, the context-preservation benefit starts to pay for the startup cost.

The third anti-pattern we see often is using sub-agents as a second opinion rather than as a delegation mechanism. An operator dispatches a sub-agent to "review" the parent's plan, the sub-agent produces a polite summary of the plan, and the operator treats that summary as validation. It is not. The sub-agent has less context than the parent, not more, and its review is structurally weaker than the parent's own reasoning. Sub-agents are for doing work the parent does not have room to do, not for checking work the parent has already done. If you need review, run a dedicated review pass at the end with the full diff as input, or hand it to a human.

How this shifts what you can ship in a day

The concrete effect of this pattern on WitsCode engagements is that tasks that used to need a human-in-the-loop all day now run unattended in the background. An ORM migration across a forty-service monorepo, which in a single-agent run would have compacted twice and required three operator nudges, runs as a parent with one sub-agent per service dispatched in staggered batches of eight. The parent's context stays under fifty thousand tokens across the whole run. We come back after lunch, read the parent's summary, spot-check three sub-agent reports, and ship.

That workflow is not magic. It is the direct consequence of treating the Task tool as the primary unit of composition and writing every prompt like a contract rather than a conversation. If you are running into context-window walls on long tasks, if your agents are compacting halfway through jobs, or if you are supervising runs you should be able to dispatch and walk away from, the fix is almost always to break the work into sub-agents with narrow context, clean contracts between them, and word-limited reports back.

→ Book a WitsCode AI workflow consultation if you want us to design the dispatch pattern and prompt templates for your codebase before you spend another week supervising runs you could have parallelised.

Get weekly field notes.

Practical writing on shipping products, straight to your inbox. No spam.

Need help with this?

MVP Development

We design and build web apps, MVPs, and SaaS products. Talk to us about what you are working on.

Talk to us

Want to discuss vibe coders for your business?

Start a project and we'll talk through where you are, what's working, and the highest-leverage moves for the next 90 days.