Skip to content
Non-Tech Founders

The Support Data Cleanup That Makes Your AI Bot Actually Useful

Your AI bot is as good as the docs you train it on. The four-step cleanup: merge duplicates, archive outdated policies, add intent tags, rewrite in clear answer form.

By WitsCode10 min read

The AI support bot that returns wrong answers is almost never a model problem. You will read long threads about temperature settings, chunk sizes, and which embedding model performs best on a particular benchmark. None of that is the reason your bot confidently tells customers the return window is sixty days when it has been thirty for the last eighteen months. The reason is that your knowledge base has three return policy pages, two of them are old, and retrieval grabbed the wrong one because nothing in the data told it which page was canonical.

This article is about the unglamorous work that separates a useful AI support bot from a liability. It is a four-step cleanup of the source docs the bot reads from, and it has nothing to do with prompt engineering. Most of the guides you find online skip this entirely and jump straight to vector databases and chunking strategies, which is like choosing a kitchen layout before you have bought the ingredients. The ingredients are the docs. If the docs are wrong, the kitchen does not matter.

Why your model is not the problem

When a non-technical founder sets up a support bot using any of the current wrappers on top of retrieval-augmented generation, the mental model they are sold is that the bot reads their help center like a smart human would. It does not. The bot performs a similarity search against chunks of your text, picks the top few, and asks the language model to answer using only what those chunks contain. If the chunks are contradictory, outdated, or phrased in a way that does not match how customers ask questions, the model produces confident nonsense. The nonsense is not a hallucination in the technical sense. It is a faithful reading of bad source material.

This is why the SERP advice about "just use a better model" almost never fixes the specific failure mode most founders see. Upgrading from a smaller model to a larger one improves reasoning on ambiguous prompts, but it cannot invent a correct answer from two wrong source documents. The improvement you actually need is upstream of the model. It lives in your help center, your shared Notion folder, your old Intercom macros, and the PDF policy document that someone on the ops team updated in March but never replaced in the public docs.

Step one: merge the duplicates so retrieval stops guessing

The first and most common failure mode is that you have more than one document covering the same topic. Most support knowledge bases grew organically. Someone wrote a return policy page when the store launched. A year later, a new hire on customer service wrote a clearer version and put it in the help center. Someone else wrote a third version as a blog post explaining the policy change. All three are still live. All three are retrievable. All three say slightly different things.

When a customer asks "how long do I have to return something," the retrieval layer scores all three pages as highly relevant. It does not know which is current. It may return the oldest one because the phrasing happens to match the customer question more closely, or it may return a blended set of chunks where the policy wording conflicts within the same context window. The model is then asked to answer from contradictory source material, and whatever it produces will be wrong half the time.

The rule is that every distinct support topic should have exactly one canonical document. Not a primary and a secondary. One. When you find duplicates, pick the most accurate version, copy anything useful from the others into it, and delete or redirect the rest. If you cannot delete them because they live in a public blog archive, move them to a folder the bot is not indexing. This is boring work and it is the single highest-leverage thing you can do to improve answer quality. Merge duplicates first and most bot quality complaints disappear before you have touched any settings.

Step two: archive the outdated policies so old answers stop surfacing

The second failure mode is more insidious because the documents are not duplicates. They are just old. Your shipping policy from 2023 might still be sitting in a Google Drive folder that the bot was pointed at during setup. Nobody linked to it from the current help center, so your human team forgot it existed. But the retrieval layer does not care about your internal link structure. It only cares about the folder path it was given. If the old document contains the phrase "standard shipping takes five to seven business days" and your current policy is two to three, the retrieval layer may prefer the older phrasing because it contains more of the customer's keywords.

Archiving, in this context, does not mean moving the file to a folder called archive and leaving it in the same drive. The bot will still find it. Archiving means physically removing the document from the indexed source, either by deleting it, moving it outside the watched folder, or marking it in a way the indexing pipeline filters out before embedding. Every document that the bot can retrieve should represent current policy. There is no such thing as a safely retrievable old policy. If the bot can reach it, the bot will eventually cite it to a customer.

A practical habit that prevents this from reoccurring is to add a single review step to every policy change. Whoever updates the current policy is also responsible for finding and removing the old version from the indexed source on the same day. If you treat this as a separate task that will get done later, it will never get done, and three months from now a customer will be quoted shipping terms that have not applied since last year.

Step three: add intent tags so the bot understands context

The third step is where most founders stop reading tutorials because it sounds technical. It is not. Intent tags are small pieces of metadata you attach to each document that tell the retrieval layer what kind of question this document answers, what state the user is likely in when they ask it, and when the document was last verified. The three tags worth caring about are category, user state, and last updated date.

Category means the topic area. Returns, shipping, account, billing, product care, sizing. Pick a vocabulary of ten to fifteen categories and tag every document with exactly one. This lets the retrieval layer filter by category when the incoming question is clearly about one topic, which dramatically reduces cross-topic contamination. A question about "how long does it take" can mean shipping or returns or a refund timeline, and a well-tagged system can use context from earlier in the conversation to narrow it down before retrieval even runs.

User state is about where the customer is in their journey. Pre-purchase, active order, post-delivery, long-term owner, churned, considering return. The same question has different correct answers depending on state. A pre-purchase customer asking about returns needs the policy summary. A customer with an active return in progress needs status-specific guidance. Tagging documents with their target user state lets you route retrieval toward the relevant version when your front end can detect the state from order data or chat context.

Last updated is the most important and most neglected tag. Every document should carry the date it was last reviewed for accuracy. Not the date it was created. Not the date of the last typo fix. The date someone who owns that policy sat down and confirmed it is still correct. When retrieval scores two documents equally, recency breaks the tie, and this is the mechanism that keeps your bot current without requiring you to delete every old version the moment policy changes. It also makes the quarterly review obvious. Sort the list by last updated, work down from the oldest, and nothing ages past ninety days without a human looking at it.

Step four: rewrite in clear answer form so the model can actually use the chunks

The fourth step is the one that produces the biggest jump in perceived quality. Most support documentation was written for humans browsing a help center. It opens with context, sets up the scenario, explains the reasoning, and eventually arrives at the answer in paragraph four. Retrieval splits that document into chunks, and the chunk containing the actual answer may not contain the keywords from the customer question, because those keywords were in the setup paragraphs that got split into a different chunk. The model then receives the setup without the answer and produces a response that circles the topic without resolving it.

The rewrite rule is that every document should either use explicit question-and-answer format or lead with the direct answer in the first sentence. Question and answer format means the document contains literal text like "Q: How long do I have to return an item? A: You have thirty days from the delivery date to return any unworn item in original packaging." That structure survives chunking because the question and answer almost always end up in the same chunk, and the question text matches the phrasing customers actually use.

The direct-answer-first format works for documents that cover multiple related questions. The opening sentence gives the answer. The next paragraph gives the context. The last paragraph gives the exceptions. A returns policy written this way starts with "You have thirty days from delivery to return unworn items for a full refund." Everything after that paragraph is elaboration. If the retrieval layer grabs only the opening chunk, the customer still gets the correct answer. If it grabs a later chunk about exceptions, the exception context is self-contained and still useful.

Rewriting existing docs is work, but it is work you would have done for a new hire anyway. The version of your help center that answers customer questions clearly in the first sentence is also the version that converts better, reduces ticket volume from self-service, and makes your human team faster. The AI bot just happens to be the most unforgiving reader of all, and building for that reader forces the kind of clarity that helps everyone else too.

What this looks like in practice over a weekend

The full cleanup for a typical store with fifty to two hundred support documents takes a focused weekend if one person owns it, or about two weeks if you spread it across a team on the side. Start by exporting the full list of documents the bot currently has access to into a spreadsheet. One row per document. Columns for title, category, last updated, status, and notes. Sort by category and work through duplicates first. For each group of similar docs, pick the winner, merge useful content into it, and mark the rest for removal. Once duplicates are resolved, go through everything remaining and mark each document as current, outdated, or unclear. Remove the outdated ones from the indexed source the same day.

Then add the intent tags. This is usually fastest in the platform that hosts your docs, whether that is Notion, a help center tool, or a plain folder of markdown files. If the tool supports properties or frontmatter, use them. If not, put the tags as a small block at the top of each document. The retrieval layer can read them either way as long as they are in the text. Finish by rewriting the top twenty documents by traffic in question-and-answer or answer-first form. You will not get to all of them. Start with the ones that drive the most questions, measure whether bot accuracy improves on those topics, and come back to the long tail over the following month.

The quality signal you should track after cleanup

The last thing worth naming is how you measure whether the cleanup worked. Bot accuracy is hard to eyeball because you do not see most of the conversations. What you can measure is escalation rate, which is the percentage of bot conversations that end with the customer asking for a human, and citation accuracy, which is whether the documents the bot referenced in its answer actually support what it said. Most bot platforms expose both metrics in some form. Before cleanup, these numbers tell you where the bot is failing most. After cleanup, they tell you whether the work paid off. If escalation rate drops and citation accuracy rises, the docs are doing their job. If they do not move, the problem is deeper than retrieval and you have earned the right to go look at the model settings.

The reason we work with founders on knowledge-base cleanup before touching any bot configuration is that cleanup is where almost all the gain lives, and it is the part most teams skip because it looks like content work rather than AI work. If you would rather have someone walk your docs with you and do the merge, archive, tag, and rewrite pass as a single engagement, that is what the arrow below points to.

Get weekly field notes.

Practical writing on shipping products, straight to your inbox. No spam.

Need help with this?

Custom Web Applications

We design and build web apps, MVPs, and SaaS products. Talk to us about what you are working on.

Talk to us

Want to discuss non-tech founders for your business?

Start a project and we'll talk through where you are, what's working, and the highest-leverage moves for the next 90 days.