Cloudflare Workers run JavaScript at the edge, which means HTML, robots.txt, sitemap.xml, llms.txt, and your response headers all flow through the same code path before reaching crawlers. That's powerful, but it creates alignment problems when Worker-rendered responses and static asset responses are configured in different places and end up saying inconsistent things.

HTML metadata must be server-rendered

<title>, <meta name="description">, and <link rel="canonical"> should appear in the initial HTML response returned by the Worker, not depend on client-side JavaScript. Google can render JavaScript, but server-rendered metadata is simpler to verify and works for clients that do not execute JavaScript. If your framework renders metadata client-side, move that logic into the Worker's SSR pass.

Canonical URLs: the most common alignment failure

The canonical URL for a page must match in three places: the <link rel="canonical"> tag in the HTML, the <loc> entry in sitemap.xml, and any reference to the page in llms.txt. A single mismatch causes crawlers to treat the page as having duplicate or ambiguous authority. On Workers deployments, this most often happens because the sitemap is generated against a staging origin or .workers.dev subdomain and never updated for production.

robots.txt and sitemap.xml as Worker routes

Serve robots.txt from your Workers router at exactly /robots.txt with Content-Type: text/plain. Don't offload it to a CDN origin with different cache rules than your Worker — crawlers that hit a stale or incorrect robots.txt during a deploy window will make decisions based on the old policy.

Sitemap.xml can be generated dynamically in a Worker, which is useful for keeping URLs fresh without a build step. When you do this, make sure the response returns Content-Type: application/xml and lists only canonical production URLs. Preview .workers.dev subdomains should never appear in a sitemap.

Cache-Control and crawlers

Cloudflare's edge cache will serve stale pages to crawlers if you don't set Cache-Control deliberately. For indexable pages, use:

Cache-Control: public, max-age=3600, stale-while-revalidate=86400

For pages you want excluded from indexing, pair a short max-age with an explicit noindex signal:

Cache-Control: no-store
X-Robots-Tag: noindex

Setting X-Robots-Tag in the response header is equivalent to a <meta name="robots" content="noindex"> tag — you don't need both, but the header works even when the crawler doesn't parse the HTML.

Block preview URLs from crawlers

Every Workers deploy gets a .workers.dev subdomain. Crawlers that index preview URLs create duplicate content problems that are annoying to clean up. Block them at two levels: add a Disallow: / rule for all user-agents in the robots.txt served from the .workers.dev origin, and return an X-Robots-Tag: noindex, nofollow header on every response from that origin.

Security headers belong in the response path

Cloudflare Workers static assets support custom headers through a _headers file, but those headers are not applied to responses generated by your Worker code. They do not cover API routes, SSR pages, or any route your Worker handler returns explicitly. Set security headers in the Worker response itself for Worker-generated responses:

return new Response(html, {
  headers: {
    'Content-Type': 'text/html; charset=utf-8',
    'Cache-Control': 'public, max-age=3600, stale-while-revalidate=86400',
    'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload',
    'X-Content-Type-Options': 'nosniff',
    'Referrer-Policy': 'strict-origin-when-cross-origin',
  },
});

Every crawlable response then carries the same policy headers whether it comes from the assets layer or the Worker handler.

llms.txt on Workers

Serve /llms.txt either as a static asset in your Assets binding or as an explicit Worker route. Either way, the response must return Content-Type: text/plain; charset=utf-8, require no authentication, and reference only canonical production URLs — not preview or staging origins. If your llms.txt is generated at build time, add a CI check that validates every URL it contains returns HTTP 200 on the production origin before deploying.

Verify before shipping

The isitready.dev scanner checks all of these surfaces against your live origin — HTML metadata, canonical alignment, robots.txt content type, sitemap URL validity, cache headers, and llms.txt format — and flags alignment failures. Run it after every Workers deploy, not just at launch.