sitemap.xml and llms.txt both describe your site to external systems, but they're built for different readers. One doesn't replace the other, and most production sites end up needing both.

What sitemap.xml does

The XML sitemap protocol (sitemaps.org, 2005) is consumed by search engine crawlers — Googlebot, Bingbot, and others. It lists canonical URLs on your site with optional metadata: lastmod to signal freshness, changefreq as a hint for recrawl scheduling, and priority to indicate relative importance within the site.

A sitemap is intended to enumerate the URLs you want crawlers to discover, especially pages that are new, deep, or hard to find through links. The protocol supports up to 50,000 URLs per file. Submission to Google Search Console and Bing Webmaster Tools is useful, but crawlers can also discover a sitemap from robots.txt or links. Crawlers use it to find pages; it is not a narrative explanation of the site.

What llms.txt does

llms.txt is a community proposal by Jeremy Howard (fast.ai / Answer.AI) from 2024. It is intended for AI language model clients and agents, not traditional search crawlers. Where a sitemap is broad, llms.txt is intentionally opinionated: a short, curated Markdown index of the surfaces you most want a compatible AI assistant to read first, before it spends tokens exploring elsewhere.

There's no URL limit in the proposal, but the intent is a file short enough to scan quickly. Level-two headings group related surfaces; bullet list items point at canonical URLs. Optional companion files can provide expanded context for agents doing research rather than quick context-loading.

A side-by-side comparison

Both files reference the same site, but they look and behave very differently:

<!-- sitemap.xml — exhaustive -->
<urlset>
  <url><loc>https://example.com/docs/getting-started</loc></url>
  <url><loc>https://example.com/docs/api-reference</loc></url>
  <url><loc>https://example.com/blog/post-1</loc></url>
  <!-- ... 2,000 more URLs ... -->
</urlset>

# llms.txt — curated
> Example: the tool that does X.

## Docs
- [Getting Started](https://example.com/docs/getting-started)
- [API Reference](https://example.com/docs/api-reference)

The sitemap tells crawlers "here are all the pages." The llms.txt tells AI assistants "here are the pages that matter most." Both reference canonical production URLs — and that overlap is where alignment problems show up.

Where they interact

Sitemap helps crawlers discover pages you want indexed. llms.txt helps compatible agents find the pages you consider authoritative. They don't compete, but they should agree. A URL in llms.txt that isn't in the sitemap might be valid — review it anyway. A URL in llms.txt that returns non-200 sends the assistant into a dead end. Both flag in an audit.

The canonical URL listed in llms.txt should match the <loc> in the sitemap and the <link rel="canonical"> tag on the page itself. Any disagreement across those three signals creates ambiguity about which URL is authoritative.

When to use each

Every production site should have a sitemap.xml if it has public pages. Submit it to Search Console and Bing Webmaster Tools. Generate it dynamically or at build time — just make sure it only lists production canonical URLs, not staging or preview origins.

llms.txt matters most for sites where AI answer visibility is a priority: documentation sites, developer tools, SaaS products, anything where users research in ChatGPT or Claude before or instead of running a search query. If your site is purely transactional and you don't care about AI citations, it's lower priority — but it costs almost nothing to add.

Verify before shipping

isitready.dev checks both files, validates URL alignment between them, and flags when your llms.txt links to URLs that aren't in your sitemap or return non-200 responses. Run it after any URL restructuring or deploy that changes your canonical configuration.