AI readiness isn't a single check — it's a set of overlapping surfaces: machine-readable files, crawler policy, structured data, security posture, and performance signals. This template gives you a repeatable checklist to run on any site before each release.

1. Machine-readable surfaces

These are the files AI assistants and crawlers look at first. Missing or broken files here mean the rest of the audit doesn't matter much.

  • /llms.txt exists at the canonical origin, is served as Content-Type: text/plain, contains valid Markdown, and every linked URL returns HTTP 200.

  • /llms-full.txt is optional, but if present it must be linked from /llms.txt — orphaned full files don't get discovered.

  • sitemap.xml exists, is referenced in robots.txt, has been submitted to Google Search Console, and all URLs in it return 200 with correct canonical tags.

  • robots.txt is present, syntactically valid, and explicitly names the major AI crawlers — at minimum GPTBot, ClaudeBot, and Google-Extended — with a deliberate Allow or Disallow directive for each.

2. Crawler policy

A robots.txt that doesn't explicitly address AI crawlers is an ambiguity bug. Most AI systems will crawl unless told otherwise, but "probably crawlable" isn't a policy.

  • Every AI crawler you care about has an explicit Allow: / or Disallow: / line — no relying on wildcard User-agent: * fallback for crawlers you want to explicitly permit.

  • Disallow rules don't accidentally block canonical discovery paths like /sitemap.xml or /llms.txt — these happen more often than you'd expect when rules are copy-pasted.

  • Crawl-delay directives, if present, are set to 10 seconds or less for indexing crawlers. Higher values cause crawlers to throttle so aggressively that new content takes days to index.

3. Structured data

Structured data is how you give AI systems unambiguous facts about your entity, content type, and authorship.

  • Every page has at minimum Organization and WebSite JSON-LD in the <head>, with url pointing at the canonical origin.

  • Software products include SoftwareApplication schema with applicationCategory, operatingSystem, and a stable url.

  • Blog posts and documentation pages include Article schema with author, datePublished, and dateModified — the modified date is the one AI systems use to assess freshness.

  • FAQ sections use FAQPage schema with one Question/Answer pair per item — this powers both featured snippets (AEO) and AI Overview extraction (GEO).

  • Validate all structured data with the schema.org validator and Google's Rich Results Test before shipping.

4. Security posture

Security headers affect trust signals for both crawlers and users. Missing headers are a P1 finding because they're trivially fixable and signal poor hygiene to automated scanners.

  • Strict-Transport-Security header present with max-age of at least 31,536,000 (one year); include includeSubDomains if your subdomains are also HTTPS.

  • X-Content-Type-Options: nosniff on every response — prevents MIME-sniffing attacks and signals correct content-type hygiene.

  • X-Frame-Options: DENY or SAMEORIGIN unless you explicitly need cross-origin iframe embedding.

  • Referrer-Policy: strict-origin-when-cross-origin — leaks less than the browser default while keeping analytics working.

  • Content Security Policy present and not unsafe-inline-only. A CSP that permits everything inline is worse than no CSP because it creates false confidence.

5. Performance signals

Performance affects crawl budget and ranking signals directly. These numbers are thresholds, not targets.

  • TTFB under 200ms at the edge for at least the 75th percentile of real users — above this, crawlers begin to throttle request rates.

  • LCP under 2.5 seconds — Google's "Good" threshold; above 4s is flagged as "Poor" in CrUX data.

  • CLS below 0.1 — layout shifts above this threshold hurt Core Web Vitals scores and create bad experiences on mobile.

  • No render-blocking scripts on docs and landing pages — these delay above-the-fold paint and are detectable in a Lighthouse audit.

6. Remediation priority matrix

Not all findings are equal. Use this ordering to sequence fixes.

P0 — Fix before the site goes live:

  • Missing robots.txt (crawlers operate in undefined state)

  • Missing sitemap.xml (canonical URL set is unknown)

  • HTTP origin without redirect to HTTPS (breaks HSTS and trust)

P1 — Fix in the current sprint:

  • No structured data on any page

  • AI crawlers blocked unintentionally by wildcard or specific Disallow

  • Missing HSTS header

P2 — Fix in the next sprint:

  • No /llms.txt file

  • CSP absent or unsafe-inline-only

  • CLS issues on key landing pages

P3 — Fix when llms.txt or content is updated:

  • /llms.txt URLs have drifted from the live sitemap

  • No /llms-full.txt linked from /llms.txt

  • Missing Article schema on blog posts

Run it before each release

isitready.dev automates this entire checklist in a single HTTP-level audit against your canonical origin. It returns a scored report with per-finding severity tags and remediation notes — the same priority matrix above, pre-populated with your site's actual results. Run it before each release, not just at setup.