Robots and Sitemap Checker
Robots.txt and sitemap.xml are small files, but they decide which URLs crawlers can discover and which canonical pages they trust.
- Surface
- Free tool
- Scope
- Public web evidence
- Auth
- None required
- Schema
- SoftwareApplication
Answer first
What it checks
The scan fetches robots.txt and sitemap.xml, verifies response status and content type, samples listed URLs, and checks canonical agreement.
Detail 01
Common blockers
Staging disallows, stale sitemap entries, wrong hosts, missing homepage URLs, and discovery files hidden by robots policy are the usual launch failures.
Detail 02
How to fix
Keep robots rules intentional, list canonical production URLs only, and make sitemap, canonical tags, and llms.txt point at the same preferred origin.
FAQ
Common questions
- Do I need both robots.txt and sitemap.xml?
- Yes for most public sites. Robots.txt declares access policy; sitemap.xml declares canonical URLs and update hints. They solve different problems.
- Should the sitemap include private pages?
- No. Sitemaps should list canonical public URLs that you want crawlers to discover and evaluate.