AI readiness isn't a single check. Machine-readable files, crawler policy, structured data, security posture, performance — each surface has its own failure modes, and a clean score on one doesn't buy you anything on the others. This template gives you a repeatable checklist to run on any site before each release.

1. Machine-readable surfaces

These are the files AI assistants and crawlers may use to orient themselves before fetching deeper pages. Missing or broken files here make the rest of the audit harder to interpret.

  • /llms.txt exists at the canonical origin, is served as Content-Type: text/plain, contains valid Markdown, and every linked URL returns HTTP 200.

  • /llms-full.txt is optional, but if present it must be linked from /llms.txt — orphaned full files don't get discovered.

  • sitemap.xml exists, is referenced in robots.txt, has been submitted to Google Search Console, and all URLs in it return 200 with correct canonical tags.

  • robots.txt is present, syntactically valid, and explicitly names the major crawler or product tokens you care about — for example OAI-SearchBot, GPTBot, ClaudeBot, Claude-SearchBot, and Google-Extended — with a deliberate Allow or Disallow directive for each.

2. Crawler policy

A robots.txt that doesn't explicitly address AI crawlers is an ambiguity bug. Most AI systems will crawl unless told otherwise, but "probably crawlable" isn't a policy.

  • Every AI crawler you care about has an explicit Allow: / or Disallow: / line — no relying on wildcard User-agent: * fallback for crawlers you want to explicitly permit.

  • Disallow rules don't accidentally block canonical discovery paths like /sitemap.xml or /llms.txt — these happen more often than you'd expect when rules are copy-pasted.

  • Crawl-delay directives, if present, are only used for crawlers that document support for the non-standard directive. Google ignores it; Anthropic documents support for it when appropriate.

3. Structured data

Structured data is how you give AI systems unambiguous facts about your entity, content type, and authorship.

  • Every page has at minimum Organization and WebSite JSON-LD in the <head>, with url pointing at the canonical origin.

  • Software products include SoftwareApplication schema with applicationCategory, operatingSystem, and a stable url.

  • Blog posts and documentation pages include Article schema with author, datePublished, and dateModified — the modified date is the one AI systems use to assess freshness.

  • FAQ sections use FAQPage schema with one Question/Answer pair per item — this powers both featured snippets (AEO) and AI Overview extraction (GEO).

  • Validate all structured data with the schema.org validator and Google's Rich Results Test before shipping.

4. Security posture

Security headers affect trust signals for both crawlers and users. Missing headers are a P1 because the fix is a few lines of config and the cost of leaving it is automated scanners flagging your origin as low-hygiene.

  • Strict-Transport-Security header present with max-age of at least 31,536,000 (one year); include includeSubDomains if your subdomains are also HTTPS.

  • X-Content-Type-Options: nosniff on every response — prevents MIME-sniffing attacks and signals correct content-type hygiene.

  • X-Frame-Options: DENY or SAMEORIGIN unless you explicitly need cross-origin iframe embedding.

  • Referrer-Policy: strict-origin-when-cross-origin — leaks less than the browser default while keeping analytics working.

  • Content Security Policy present and not unsafe-inline-only. A CSP that permits everything inline is worse than no CSP because it creates false confidence.

5. Performance signals

Performance affects user experience and how efficiently crawlers fetch a site. The numbers below are thresholds, not targets.

  • TTFB under 200ms at the edge for at least the 75th percentile of real users where practical. Slower responses increase crawl cost and make pages feel unreliable, even if they still get indexed.

  • LCP under 2.5 seconds — Google's "Good" threshold; above 4s is flagged as "Poor" in CrUX data.

  • CLS below 0.1 — layout shifts above this threshold hurt Core Web Vitals scores and create bad experiences on mobile.

  • No render-blocking scripts on docs and landing pages — these delay above-the-fold paint and are detectable in a Lighthouse audit.

6. Remediation priority matrix

Not all findings are equal. Use this ordering to sequence fixes.

P0 — Fix before the site goes live:

  • Missing robots.txt (crawlers operate in undefined state)

  • Missing sitemap.xml (canonical URL set is unknown)

  • HTTP origin without redirect to HTTPS (breaks HSTS and trust)

P1 — Fix in the current sprint:

  • No structured data on any page

  • AI crawlers blocked unintentionally by wildcard or specific Disallow

  • Missing HSTS header

P2 — Fix in the next sprint:

  • No /llms.txt file

  • CSP absent or unsafe-inline-only

  • CLS issues on key landing pages

P3 — Fix when llms.txt or content is updated:

  • /llms.txt URLs have drifted from the live sitemap

  • No /llms-full.txt linked from /llms.txt

  • Missing Article schema on blog posts

Run it before each release

isitready.dev automates this entire checklist in a single HTTP-level audit against your canonical origin. It returns a scored report with per-finding severity tags and remediation notes — the same priority matrix above, pre-populated with your site's actual results. Run it before each release, not just at setup.