21 Jun 2026

Train Your Chatbot From Your Website

Train your chatbot from your website with a full-domain crawl, a sitemap, or a single URL. See how grounded answers get built and kept fresh, no autolearning.

Your website already holds most of the answers your visitors ask for. The product pages, the help docs, the pricing page, the FAQ buried three clicks deep. The problem is not that the content is missing. It is that a visitor will not read six pages to find one line. So they open the chat instead, and now the bot needs to know everything the site knows.

The fastest fix is to train your chatbot from your website directly: point it at the pages you already publish and let it build a knowledge base from your site. No retyping your docs into a new tool, no copy-paste marathon. The bot reads your real content and answers from it, which is the whole point. We built it to answer from what you actually wrote, and to decline when the question falls outside that.

This guide covers the three ways to feed your site into the bot, how the knowledge gets turned into grounded answers, and the part nobody likes to admit: keeping it current is a human job, not magic.

Three ways to pull your site into the knowledge base

There is no single right way to train a chatbot from a website, because not every site is shaped the same. So the bot gives you three.

The first is a full-domain crawl. You hand over your root domain and the crawler walks the public pages it can reach, following internal links the way a visitor would. A website crawl is where most teams start, because one input covers the whole site. Before anything gets ingested, a free discovery pass shows you which pages were found, so you can see the scope before you commit.

The second is a sitemap. If you already publish an XML sitemap, that is a clean, ordered list of exactly the URLs you want indexed. Building a chatbot from a sitemap skips the guesswork of a crawl and pulls the precise set of pages you maintain on purpose.

The third is a single URL. Sometimes you do not want the whole site, just one page. A new pricing page, a single long-form guide, an updated returns policy. Drop in one address and only that page joins the knowledge base. It is the surgical option when a broad crawl would be overkill.

All three read public pages only. The bot does not log into gated areas or scrape content behind a paywall, and it does not touch anything that is not already on the open web. What your visitors can read, the bot can train on. What they cannot, it will not.

The website is one source, not the only one

A crawl gets you a long way, but a website is rarely the complete picture of what your bot should know.

Some answers live in a PDF nobody bothered to turn into a web page. A returns policy in a Word doc, a setup walkthrough as a text file, a spec sheet in markdown. For those, you upload the file directly and it joins the same knowledge base as your crawled pages. The bot does not care whether an answer came from a URL or a document. It treats them as one body of knowledge. If most of your real answers live in files rather than on the site, the companion guide on building a chatbot from your docs walks through that path.

You can also write knowledge by hand. When a question keeps coming up and the answer is not written down anywhere yet, you type it straight into the dashboard and it becomes part of what the bot can say. Crawl, upload, and manual entry stack on top of each other. Together they form the knowledge base the bot answers from.

On top of all that content, the system builds a knowledge graph you can see in the dashboard, so the connections between your pages, documents, and manual notes are not a black box.

From crawled pages to grounded answers

Pulling in your pages is only half of it. When you train AI from web pages, the harder promise is that the bot answers from those pages and nothing else.

Here is how that holds up in practice. When a visitor asks a question, the bot searches the knowledge base built from your site, finds the passages that match, and writes its reply from them. The answer is grounded in your content, not invented from a general model's guesswork. If the bot cannot find anything relevant, it says so and declines, instead of filling the gap with a confident-sounding fabrication.

We think this is the line that separates a useful support bot from a liability. A bot that makes up a refund window or a shipping time creates a problem your team has to clean up later. One that answers strictly from your pages, and stops when the pages run out, keeps you out of that mess. You can dig into the mechanics in the write-up on the no-hallucination approach, which is the same grounding logic applied to every answer.

The bot can also cite where an answer came from, so a visitor sees which page backed the reply. That citation behavior is toggleable, on when you want the receipts visible, off when you want a cleaner chat. Either way the answer is sourced from your real content.

What it takes to set this up

Setting up a website crawl knowledge base is not a project. It is a few minutes in the dashboard.

You open your workspace, paste your domain or sitemap or a single URL, and run the discovery pass to confirm the pages. Then you ingest, and the bot starts answering from your content. Once it is trained, you embed the widget on your site, which is a single snippet drop covered in the website chatbot setup guide. The training and the widget are two separate steps: one teaches the bot, the other puts it where your visitors are.

How much you can crawl, how large the knowledge base can grow, and how often you can rebuild it depend on your plan. The free tier is enough to train a small site and see grounded answers in action. Larger sites with more pages and more frequent refreshes sit on the higher tiers. The pricing guide lays out the trainable-URL limits and knowledge-base size per plan.

Keeping it current is your move, not the bot's

Here is the honest part. The bot does not improve on its own. It will not quietly re-read your site overnight and patch its own gaps.

When your content changes, you refresh the knowledge. Re-scrape a page or your whole domain and change detection flags what actually moved, so a refresh updates only the pages that changed instead of re-ingesting everything. Re-upload a document when its file changes. Edit a manual note by hand when the answer needs a tweak. This is the human-in-the-loop part, and it is deliberate.

The way you catch what needs fixing is the Response Monitor. You review what the bot actually answered, spot the questions where it declined or got something stale, and then update the knowledge to close the gap. The loop is simple: read the real conversations, find the holes, feed the bot the missing or updated content. There is no autolearning step hiding in the middle, and we would rather be straight about that than promise a bot that teaches itself.

FAQ

How do I train my chatbot from my website?

You point the bot at your site in one of three ways: a full-domain crawl, an XML sitemap, or a single URL. A free discovery pass shows the pages found before you ingest. Once ingested, the bot answers from those pages, grounded in your real content.

Will the bot crawl pages behind a login?

No. The crawler reads public pages only. It does not sign in to gated areas, scrape paywalled content, or touch anything that is not already on the open web. What your visitors can read, the bot can learn from.

Does the chatbot keep learning from my site automatically?

No. There is no autolearning. When your site changes, you refresh the knowledge yourself: re-scrape with change detection so only changed pages update, re-upload a document, or edit a note by hand. You catch what needs fixing in the Response Monitor, then update the knowledge.

Can I train the bot from my website and my documents at the same time?

Yes. A website crawl, uploaded files, and manual entries all feed the same knowledge base. The bot treats a crawled page and an uploaded PDF as one body of knowledge and answers from whichever holds the match.

What happens when the bot cannot find an answer on my site?

It declines instead of inventing one. The reply is grounded in your pages, and when nothing relevant exists, the bot says so rather than fabricating a number or a policy. It can also cite the page an answer came from when you turn citations on.

Ready to train your chatbot from your website? Start in the knowledge base to point the bot at your domain, then check the pricing guide for the trainable-URL limits on each plan.