Technical SEO

robots.txt policy

robots.txt policy is one of the public readiness signals included in isitready.dev reports.

What we check

How the scan observes this signal.

The scanner fetches /robots.txt, confirms it responds with a 2xx plain-text status, parses the User-agent groups, and reports which AI crawler tokens (GPTBot, ClaudeBot, Google-Extended, ChatGPT-User, PerplexityBot) are explicitly allowed, disallowed, or missing.

Why it matters

Why this shows up on the report card.

Crawler policy should be easy to fetch, syntactically valid, and aligned with AI crawler intent.

Sample evidence

What a passing row looks like.

GET /robots.txt
200 OK · text/plain · 412 B
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Disallow: /private
User-agent: Google-Extended
missing — inherits * policy
Sitemap reference
https://example.com/sitemap.xml

How to improve

Steps in the remediation brief.

  1. Serve /robots.txt from the canonical origin with a text/plain content type and no redirects through www.

  2. Explicitly name the AI crawlers you do (or do not) want — GPTBot, ClaudeBot, Google-Extended, PerplexityBot, ChatGPT-User — instead of leaving them to infer from User-agent: *.

  3. Reference your sitemap and llms.txt with Sitemap: and standard comments so AI and search crawlers pick up both discovery surfaces.

  4. Re-run the scan to confirm the evidence row reports each AI bot's resolved policy.

Common questions

Questions people ask about this check.

Should I block GPTBot, ClaudeBot, and Google-Extended?
That is a policy decision — isitready.dev does not recommend either direction. What we do check is that the policy is explicit. Silence (only User-agent: *) leaves AI crawlers to guess, which usually means they treat your site as fully crawlable whether you want that or not.
What format does isitready.dev expect for robots.txt?
Standard Robots Exclusion Protocol — User-agent groups, Allow/Disallow rules, and an optional Sitemap: line. We validate syntax loosely (the way crawlers actually parse) and flag common gotchas like unescaped spaces, case mismatches, and ordering traps.
Does robots.txt need to reference llms.txt?
It does not have to, but doing so is the cheapest way to raise discovery coverage for AI assistants. Add a comment or Sitemap-style line pointing at /llms.txt so agents that check robots.txt first get a direct handoff.