AI readiness isn't a single check. Machine-readable files, crawler policy, structured data, security posture, performance — each surface has its own failure modes, and a clean score on one doesn't buy you anything on the others. This template gives you a repeatable checklist to run on any site before each release.
1. Machine-readable surfaces
These are the files AI assistants and crawlers may use to orient themselves before fetching deeper pages. Missing or broken files here make the rest of the audit harder to interpret.
/llms.txtexists at the canonical origin, is served asContent-Type: text/plain, contains valid Markdown, and every linked URL returns HTTP 200./llms-full.txtis optional, but if present it must be linked from/llms.txt— orphaned full files don't get discovered.sitemap.xmlexists, is referenced inrobots.txt, has been submitted to Google Search Console, and all URLs in it return 200 with correct canonical tags.robots.txtis present, syntactically valid, and explicitly names the major crawler or product tokens you care about — for example OAI-SearchBot, GPTBot, ClaudeBot, Claude-SearchBot, and Google-Extended — with a deliberate Allow or Disallow directive for each.
2. Crawler policy
A robots.txt that doesn't explicitly address AI crawlers is an ambiguity bug. Most AI systems will crawl unless told otherwise, but "probably crawlable" isn't a policy.
Every AI crawler you care about has an explicit
Allow: /orDisallow: /line — no relying on wildcardUser-agent: *fallback for crawlers you want to explicitly permit.Disallow rules don't accidentally block canonical discovery paths like
/sitemap.xmlor/llms.txt— these happen more often than you'd expect when rules are copy-pasted.Crawl-delay directives, if present, are only used for crawlers that document support for the non-standard directive. Google ignores it; Anthropic documents support for it when appropriate.
3. Structured data
Structured data is how you give AI systems unambiguous facts about your entity, content type, and authorship.
Every page has at minimum
OrganizationandWebSiteJSON-LD in the<head>, withurlpointing at the canonical origin.Software products include
SoftwareApplicationschema withapplicationCategory,operatingSystem, and a stableurl.Blog posts and documentation pages include
Articleschema withauthor,datePublished, anddateModified— the modified date is the one AI systems use to assess freshness.FAQ sections use
FAQPageschema with oneQuestion/Answerpair per item — this powers both featured snippets (AEO) and AI Overview extraction (GEO).Validate all structured data with the schema.org validator and Google's Rich Results Test before shipping.
4. Security posture
Security headers affect trust signals for both crawlers and users. Missing headers are a P1 because the fix is a few lines of config and the cost of leaving it is automated scanners flagging your origin as low-hygiene.
Strict-Transport-Securityheader present withmax-ageof at least 31,536,000 (one year); includeincludeSubDomainsif your subdomains are also HTTPS.X-Content-Type-Options: nosniffon every response — prevents MIME-sniffing attacks and signals correct content-type hygiene.X-Frame-Options: DENYorSAMEORIGINunless you explicitly need cross-origin iframe embedding.Referrer-Policy: strict-origin-when-cross-origin— leaks less than the browser default while keeping analytics working.Content Security Policy present and not
unsafe-inline-only. A CSP that permits everything inline is worse than no CSP because it creates false confidence.
5. Performance signals
Performance affects user experience and how efficiently crawlers fetch a site. The numbers below are thresholds, not targets.
TTFB under 200ms at the edge for at least the 75th percentile of real users where practical. Slower responses increase crawl cost and make pages feel unreliable, even if they still get indexed.
LCP under 2.5 seconds — Google's "Good" threshold; above 4s is flagged as "Poor" in CrUX data.
CLS below 0.1 — layout shifts above this threshold hurt Core Web Vitals scores and create bad experiences on mobile.
No render-blocking scripts on docs and landing pages — these delay above-the-fold paint and are detectable in a Lighthouse audit.
6. Remediation priority matrix
Not all findings are equal. Use this ordering to sequence fixes.
P0 — Fix before the site goes live:
Missing
robots.txt(crawlers operate in undefined state)Missing
sitemap.xml(canonical URL set is unknown)HTTP origin without redirect to HTTPS (breaks HSTS and trust)
P1 — Fix in the current sprint:
No structured data on any page
AI crawlers blocked unintentionally by wildcard or specific Disallow
Missing HSTS header
P2 — Fix in the next sprint:
No
/llms.txtfileCSP absent or
unsafe-inline-onlyCLS issues on key landing pages
P3 — Fix when llms.txt or content is updated:
/llms.txtURLs have drifted from the live sitemapNo
/llms-full.txtlinked from/llms.txtMissing
Articleschema on blog posts
Run it before each release
isitready.dev automates this entire checklist in a single HTTP-level audit against your canonical origin. It returns a scored report with per-finding severity tags and remediation notes — the same priority matrix above, pre-populated with your site's actual results. Run it before each release, not just at setup.