sitemap.xml and llms.txt both help external systems understand what's on your site, but they serve different audiences and answer different questions. Using one doesn't replace the other.

What sitemap.xml does

The XML sitemap protocol (sitemaps.org, 2005) is consumed by search engine crawlers — Googlebot, Bingbot, and others. It lists all canonical URLs on your site with optional metadata: lastmod to signal freshness, changefreq as a hint for recrawl scheduling, and priority to indicate relative importance within the site.

A sitemap is exhaustive by design. Its job is to enumerate every URL you want indexed so crawlers don't miss any of them through link discovery alone. The protocol supports up to 50,000 URLs per file and requires submission to Google Search Console and Bing Webmaster Tools to take full effect. Crawlers use it to find pages — they don't use it to understand the site.

What llms.txt does

llms.txt is a community standard proposed by Jeremy Howard (fast.ai / Answer.AI) in 2024. It's consumed by AI language model clients and agents, not search crawlers. Where a sitemap is exhaustive, llms.txt is intentionally opinionated: a short, curated Markdown index of the surfaces you most want an AI assistant to read first, before it spends tokens exploring elsewhere.

There's no URL limit in the protocol, but the intent is a file short enough to read in under a minute. Level-two headings group related surfaces; bullet list items point at canonical URLs. An optional /llms-full.txt variant extends this with one-sentence summaries per link, which is more useful for agents doing research than for quick context-loading.

A side-by-side comparison

Both files reference the same site, but they look and behave very differently:

<!-- sitemap.xml — exhaustive -->
<urlset>
  <url><loc>https://example.com/docs/getting-started</loc></url>
  <url><loc>https://example.com/docs/api-reference</loc></url>
  <url><loc>https://example.com/blog/post-1</loc></url>
  <!-- ... 2,000 more URLs ... -->
</urlset>
# llms.txt — curated
> Example: the tool that does X.

## Docs
- [Getting Started](https://example.com/docs/getting-started)
- [API Reference](https://example.com/docs/api-reference)

The sitemap tells crawlers "here are all the pages." The llms.txt tells AI assistants "here are the pages that matter most." Both reference canonical production URLs — and that overlap is where alignment problems show up.

Where they interact

Sitemap ensures all your pages get crawled and indexed. llms.txt ensures the right pages get quoted in AI answers. They don't compete, but they must agree. If llms.txt links to a URL that isn't in the sitemap, that URL may not be indexed — meaning the AI assistant cites a page that search engines don't rank. If llms.txt links to a URL that returns non-200, the assistant hits a dead end. Both failures are audit red flags.

The canonical URL listed in llms.txt should match the <loc> in the sitemap and the <link rel="canonical"> tag on the page itself. Any disagreement across those three signals creates ambiguity about which URL is authoritative.

When to use each

Every production site should have a sitemap.xml if it has public pages. Submit it to Search Console and Bing Webmaster Tools. Generate it dynamically or at build time — just make sure it only lists production canonical URLs, not staging or preview origins.

llms.txt matters most for sites where AI answer visibility is a priority: documentation sites, developer tools, SaaS products, anything where users research in ChatGPT or Claude before or instead of running a search query. If your site is purely transactional and you don't care about AI citations, it's lower priority — but it costs almost nothing to add.

Verify before shipping

isitready.dev checks both files, validates URL alignment between them, and flags when your llms.txt links to URLs that aren't in your sitemap or return non-200 responses. Run it after any URL restructuring or deploy that changes your canonical configuration.