AI readiness isn't a single check — it's a set of overlapping surfaces: machine-readable files, crawler policy, structured data, security posture, and performance signals. This template gives you a repeatable checklist to run on any site before each release.
1. Machine-readable surfaces
These are the files AI assistants and crawlers look at first. Missing or broken files here mean the rest of the audit doesn't matter much.
/llms.txtexists at the canonical origin, is served asContent-Type: text/plain, contains valid Markdown, and every linked URL returns HTTP 200./llms-full.txtis optional, but if present it must be linked from/llms.txt— orphaned full files don't get discovered.sitemap.xmlexists, is referenced inrobots.txt, has been submitted to Google Search Console, and all URLs in it return 200 with correct canonical tags.robots.txtis present, syntactically valid, and explicitly names the major AI crawlers — at minimum GPTBot, ClaudeBot, and Google-Extended — with a deliberate Allow or Disallow directive for each.
2. Crawler policy
A robots.txt that doesn't explicitly address AI crawlers is an ambiguity bug. Most AI systems will crawl unless told otherwise, but "probably crawlable" isn't a policy.
Every AI crawler you care about has an explicit
Allow: /orDisallow: /line — no relying on wildcardUser-agent: *fallback for crawlers you want to explicitly permit.Disallow rules don't accidentally block canonical discovery paths like
/sitemap.xmlor/llms.txt— these happen more often than you'd expect when rules are copy-pasted.Crawl-delay directives, if present, are set to 10 seconds or less for indexing crawlers. Higher values cause crawlers to throttle so aggressively that new content takes days to index.
3. Structured data
Structured data is how you give AI systems unambiguous facts about your entity, content type, and authorship.
Every page has at minimum
OrganizationandWebSiteJSON-LD in the<head>, withurlpointing at the canonical origin.Software products include
SoftwareApplicationschema withapplicationCategory,operatingSystem, and a stableurl.Blog posts and documentation pages include
Articleschema withauthor,datePublished, anddateModified— the modified date is the one AI systems use to assess freshness.FAQ sections use
FAQPageschema with oneQuestion/Answerpair per item — this powers both featured snippets (AEO) and AI Overview extraction (GEO).Validate all structured data with the schema.org validator and Google's Rich Results Test before shipping.
4. Security posture
Security headers affect trust signals for both crawlers and users. Missing headers are a P1 finding because they're trivially fixable and signal poor hygiene to automated scanners.
Strict-Transport-Securityheader present withmax-ageof at least 31,536,000 (one year); includeincludeSubDomainsif your subdomains are also HTTPS.X-Content-Type-Options: nosniffon every response — prevents MIME-sniffing attacks and signals correct content-type hygiene.X-Frame-Options: DENYorSAMEORIGINunless you explicitly need cross-origin iframe embedding.Referrer-Policy: strict-origin-when-cross-origin— leaks less than the browser default while keeping analytics working.Content Security Policy present and not
unsafe-inline-only. A CSP that permits everything inline is worse than no CSP because it creates false confidence.
5. Performance signals
Performance affects crawl budget and ranking signals directly. These numbers are thresholds, not targets.
TTFB under 200ms at the edge for at least the 75th percentile of real users — above this, crawlers begin to throttle request rates.
LCP under 2.5 seconds — Google's "Good" threshold; above 4s is flagged as "Poor" in CrUX data.
CLS below 0.1 — layout shifts above this threshold hurt Core Web Vitals scores and create bad experiences on mobile.
No render-blocking scripts on docs and landing pages — these delay above-the-fold paint and are detectable in a Lighthouse audit.
6. Remediation priority matrix
Not all findings are equal. Use this ordering to sequence fixes.
P0 — Fix before the site goes live:
Missing
robots.txt(crawlers operate in undefined state)Missing
sitemap.xml(canonical URL set is unknown)HTTP origin without redirect to HTTPS (breaks HSTS and trust)
P1 — Fix in the current sprint:
No structured data on any page
AI crawlers blocked unintentionally by wildcard or specific Disallow
Missing HSTS header
P2 — Fix in the next sprint:
No
/llms.txtfileCSP absent or
unsafe-inline-onlyCLS issues on key landing pages
P3 — Fix when llms.txt or content is updated:
/llms.txtURLs have drifted from the live sitemapNo
/llms-full.txtlinked from/llms.txtMissing
Articleschema on blog posts
Run it before each release
isitready.dev automates this entire checklist in a single HTTP-level audit against your canonical origin. It returns a scored report with per-finding severity tags and remediation notes — the same priority matrix above, pre-populated with your site's actual results. Run it before each release, not just at setup.