AI Agents in Production: A Non-Technical Founder's Framework
What an agent actually is versus a prompt versus an automation, what production means for an agent, and the framework we use to decide if a task is agent-shaped or automation-shaped.
The word agent is doing too much work right now
Every vendor deck you have seen in the last twelve months promises an agent. The word has collapsed under its own weight. A chatbot that answers a question is called an agent. A Zapier flow that moves a row from a spreadsheet to a CRM is called an agent. A research assistant that opens a browser, reads three pages, and writes a memo is called an agent. When one word covers that much ground, it stops being useful for decisions. Founders who are not engineers end up signing contracts for capabilities they do not need and skipping capabilities they actually do need, because the vocabulary failed them at the point of purchase.
This article is the framework we walk our advisory clients through before they spend a dollar on agent infrastructure. It is built around three questions. What is an agent, really. What does production mean when the thing inside the system is non-deterministic. And which tasks inside your business are actually shaped like agent problems rather than automation problems. If you can answer those three questions in plain English for a given workflow, you will make the right build-or-buy call most of the time, and you will stop paying vendor premiums for things that a cron job and a prompt could have handled.
The definition triangle: prompt, automation, agent
The cleanest mental model we have found is a triangle with three corners. Each corner is a different category of software, and the boundaries between them are not about how smart the system feels. They are about who decides the next step.
A prompt is a single request to a language model with a single response. You type a question, the model produces text, and the interaction ends there. A prompt has no memory of prior runs, no access to your systems, and no ability to take action in the world. The human is the orchestrator. The model is a function that turns input text into output text. Most of what people call AI in the consumer sense is this. It is powerful, but it is bounded. The model never decides what happens next because there is no next.
An automation is a deterministic workflow. Someone, usually an operator or a developer, has drawn a directed graph of steps ahead of time. When a new lead comes in, send them a welcome email. When the email bounces, tag the contact as invalid. When the tag is applied, remove them from the nurture sequence. The path is knowable at design time. You can draw it on a whiteboard. The system follows the edges between nodes and never invents a new edge. If a step involves an AI call, it is a tool the automation uses at a specific node, not a decision-maker. The human designed the graph. The machine walks it.
An agent is a system where a language model decides the next step at runtime using tools. The tools are bounded, the goal is specified, and the budget is capped, but the path from the starting state to the goal state is not known ahead of time. The agent looks at the current state, picks an action from the tools available to it, observes the result, and decides what to do next. It might search, then read, then write, then call an API, then revise its plan, then try again. The sequence is discovered as the work happens. This is the corner of the triangle that people mean when they say the word agent in the strong sense, and it is the corner where almost all of the new risk lives.
The distinction matters because each corner of the triangle has a different cost profile, a different failure mode, and a different governance requirement. A prompt fails by producing a bad paragraph. An automation fails by running the wrong step or stopping at a broken node, and you can fix it by editing the graph. An agent fails by doing something you did not anticipate, because the whole premise of the agent is that you did not anticipate the path. Treating these three as the same thing is why so many founders end up with tools that either overreach or underdeliver.
What production means when the system is non-deterministic
Most founders have an intuitive sense of what production means for a normal piece of software. It is deployed, it is monitored, it handles real users without breaking, and there is a pager that rings if it does. That definition transfers cleanly to automations, because automations are deterministic and the failure surface is bounded. It does not transfer to agents. When the system decides its own path, production needs five additional properties that do not show up on any traditional checklist.
The first is a cost cap. Every time an agent takes a step, it spends money. Tokens, API calls, third-party tool fees, compute time. Unlike a cron job, the number of steps is not fixed. A badly prompted agent will loop, retry, or explore until it burns through a budget that a founder never agreed to. Production-grade agents have a hard ceiling on cost per run and a hard ceiling on total cost per day, enforced outside the agent itself. The agent cannot be trusted to police its own spend because the same model that is spending the money is the one deciding whether to keep spending it.
The second is output validation. Because the agent is producing novel artifacts, you cannot write a traditional test that says the output equals a known value. What you can do is define structural and semantic checks that every output must pass before it is considered done. If the agent is writing an email, the output must parse as valid email, must have a subject line, must not contain placeholder text, must pass a toxicity filter. These checks are independent of the agent and run on every output. If a check fails, the result is discarded, not retried blindly.
The third is a human in the loop at the right altitude. For reversible, low-stakes actions the agent runs autonomously. For actions that touch money, customers, or public-facing content, the agent drafts and a human approves. The mistake founders make is putting the human either nowhere or everywhere. Nowhere is how you end up with an agent that emailed your entire list a draft that was never meant to ship. Everywhere is how you end up with an agent that is slower than a human doing the task manually, because a reviewer has to sign off on every file read. Production means drawing the approval line at exactly the altitude where the cost of a mistake exceeds the cost of review.
The fourth is rollback. When the agent does something wrong, and it will, you need a clean way to undo it. This means every write action the agent takes has to leave a trail that is reversible. If the agent updates a record, the prior value is logged. If the agent sends a message, there is a sent-items record that can be referenced for a follow-up correction. If the agent creates a file, it lives in a sandbox until it is promoted. An agent without rollback is a liability in production no matter how good its average output is, because the average does not protect you from the bad tail.
The fifth is observability. You need to be able to answer, after the fact, three questions for any given run. What did the agent believe about the state of the world when it started. What steps did it take and why. What was the output at each step. Without that trail, debugging an agent is impossible. With it, you can tell the difference between a bad prompt, a bad tool, and a real edge case in the data. Observability for agents is not the same as logs for a normal service. It is a step-by-step trace of reasoning and action, stored long enough to be useful when something goes wrong a week later.
If your vendor cannot show you how they handle all five of these properties, the product is not in production. It is in demo. The demo may be impressive, and the demo may even work for ninety percent of your runs, but the remaining ten percent is where the damage lives.
The agent-shaped versus automation-shaped rule
Here is the single question that will save most founders the most money. Is the path to the goal knowable ahead of time, or does it depend on what you discover at runtime.
If the path is knowable, the task is automation-shaped. You should build or buy an automation. The cost is lower, the failure modes are smaller, the governance is simpler, and the output is predictable. Most of what gets sold as an agent today is actually an automation dressed up with a language model at one node. That is fine, but you should pay automation prices for it, not agent prices.
If the path depends on runtime discovery, the task is agent-shaped. You genuinely need a system that decides the next step based on what it just saw. The cost is higher, the failure modes are larger, the governance is heavier, but the work can only be done this way. Research tasks, triage tasks, open-ended customer support for novel questions, and code exploration are the canonical examples. The reason they need agents is that the action at step three depends on the content discovered at step two, which depends on the query that made sense at step one, and no one can draw that graph ahead of time.
A simple heuristic for applying the rule. Sit with the person who currently does the work and ask them to describe how they do it. If their description includes phrases like always do this next or if it is a yes, do X, if it is a no, do Y, you are listening to an automation. If their description includes phrases like it depends on what I find or I usually have to go look and then decide, you are listening to an agent. The vocabulary of the person doing the work tells you the shape of the problem.
The error that most expensive is picking an agent for an automation-shaped task. You pay for runtime reasoning you do not need, you inherit a bigger failure surface than necessary, and you have to build the five production properties around a system that could have been a workflow diagram. The second most expensive error is picking an automation for an agent-shaped task. You end up with a brittle system that works for the three cases you drew and breaks the moment the input looks different, and you blame the technology when you actually chose the wrong tool.
How to run the framework on your own business
Take any workflow in your company that someone has proposed using AI for. Write down the starting state, the goal state, and the sequence of steps a human would take today. Then ask, in order, the three questions from this framework.
First, is this a prompt, an automation, or an agent. A prompt is a one-shot conversation. An automation is a fixed graph. An agent is a runtime decision-maker. Be honest about which one you actually need, not which one sounds most impressive in a board update.
Second, if it is an agent, is the path genuinely unknowable ahead of time, or are you describing a fixed workflow in fuzzy language because you have not sat down to draw it. Most founders who think they need an agent actually need an automation with a model call at one step. That is cheaper, safer, and usually faster to ship.
Third, if it really is agent-shaped, can you afford the five production properties. Cost cap, output validation, human in the loop at the right altitude, rollback, and observability. If the task is not valuable enough to justify building all five, it is not valuable enough to run as an agent in production. It can run as a prototype, it can run with a human babysitting every run, but it cannot run on its own.
Most workflows, when you run them through this framework honestly, come out the same way. The high-value work is mostly automation-shaped and wants a tight integration between a deterministic workflow and a few well-scoped model calls. The genuinely agent-shaped work is rarer than the vendors want you to believe, but when you find it, the production bar is higher than the vendors want you to believe. Both of those truths cut against the current market narrative, which is why the framework is useful. It gives you a way to think that is independent of whatever is being pitched to you this week.
Where WitsCode comes in
We run this framework with founders every week, usually in the first thirty minutes of an advisory call. Most of the time the answer is that the team has been sold an agent when an automation would have shipped faster and cheaper. Sometimes the answer is the opposite and the team has been trying to hand-code a workflow that genuinely needs runtime reasoning. Either way, the decision gets made before anyone writes a line of code or signs a contract, which is where the leverage is.
If you want a second opinion on whether a workflow in your business is agent-shaped or automation-shaped before you commit budget, WitsCode offers a focused agent versus automation advisory. We review the workflow, return a recommendation with reasoning, and give you the decision criteria you can reuse on the next one. That is the whole engagement. No retainer, no implementation obligation, just the right call at the right altitude.
Get weekly field notes.
Practical writing on shipping products, straight to your inbox. No spam.
Need help with this?
Custom Web Applications
We design and build web apps, MVPs, and SaaS products. Talk to us about what you are working on.
Talk to usWant to discuss non-tech founders for your business?
Start a project and we'll talk through where you are, what's working, and the highest-leverage moves for the next 90 days.