Time to First Byte sets the ceiling for every other paint metric. Google's web.dev guidance puts the "good" TTFB threshold at 800ms at the 75th percentile — but that's the ceiling for any backend, not a target for an edge runtime. A page with 800ms TTFB has already spent a third of its 2.5s LCP budget before the browser parses a single byte. On Cloudflare Workers you should aim much lower.

What good actually looks like on Workers

Hold yourself to two numbers: median TTFB under 100ms and p95 under 200ms, measured from real user POPs. That's achievable because Workers do not pay container startup cost. The runtime uses V8 isolates rather than per-function processes, and the Cloudflare docs on how Workers works note that an isolate starts roughly a hundred times faster than a Node process on a container or VM. Cold start on the JavaScript runtime is paid once per machine, not per request.

A Worker that returns a static string with no bindings should clock 5-15ms TTFB from a nearby POP. A Worker that reads a hot KV key plus renders HTML from a session lookup should land in the 30-80ms range. Anything past 200ms median deserves a Server-Timing breakdown. The regression is almost always one of four culprits.

The four regressions that account for most slow Workers

KV cold reads. Workers KV hot reads run 500µs to 10ms because the value sits in a local cache tier. A cold read fetches from a regional cache, then a central tier, then the central stores. The KV concepts page is explicit: the first access in a new POP is slow, often 50-100ms before recent backend improvements brought European and Asian p95 to roughly 50ms. If your traffic is geographically scattered and your keys are rarely read, every request hits a cold cache. Coalesce small keys into a super-object so cold keys ride along with hot ones, or set a longer cacheTtl.

D1 cross-region queries. D1 ships read replicas across regions, but only when you opt in via the Sessions API. The D1 read replication docs are clear that without withSession, every query goes to the primary — often on the other side of the planet from your user. A Worker in Frankfurt querying a primary in Western North America pays 150ms of round-trip on every read.

Hyperdrive misuse, or the lack of it. Direct Postgres from a Worker means establishing a new TLS connection per invocation, plus auth handshake, plus the actual query. That's three round-trips before any data moves. Hyperdrive pools connections in regions close to your origin database. The March 2025 regional pooling rollout cut uncached query latency by up to 90%. The Hyperdrive concepts page gives the numbers: a query from a distant region adds 20-30ms round-trip, but 1-3ms when the Worker is placed nearby.

Uncached fetch-to-origin. A Worker that proxies to an origin without a cf.cacheEverything directive or a properly configured cache key turns every request into a long-haul fetch. Edge cache hit TTFB sits in the 10-30ms range. A miss-to-origin on a 200ms-distant origin pushes TTFB past 250ms before your Worker even sees the response.

Instrument with Server-Timing

The W3C Server-Timing header gives you per-component TTFB breakdown without any client code. Browsers expose it via the Performance API and DevTools surfaces it in the Network panel automatically. Set it on every response from your Worker:

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const start = Date.now();

    const kvStart = Date.now();
    const session = await env.SESSIONS.get(sessionKey, { cacheTtl: 3600 });
    const kvDur = Date.now() - kvStart;

    const dbStart = Date.now();
    const dbSession = env.DB.withSession("first-unconstrained");
    const result = await dbSession
      .prepare("SELECT slug, title FROM articles WHERE slug = ?")
      .bind(slug)
      .first();
    const dbDur = Date.now() - dbStart;

    const html = render(result);
    const total = Date.now() - start;

    return new Response(html, {
      headers: {
        "Content-Type": "text/html; charset=utf-8",
        "Cache-Control": "public, max-age=60, stale-while-revalidate=86400",
        "Server-Timing": [
          `kv;dur=${kvDur};desc="session lookup"`,
          `d1;dur=${dbDur};desc="article read"`,
          `worker;dur=${total};desc="total handler"`,
        ].join(", "),
      },
    });
  },
} satisfies ExportedHandler<Env>;

The header syntax — metric;dur=N;desc="text" — is defined in the W3C Server Timing spec. One line gets you a flame-graph in DevTools and a queryable signal for any RUM tool that reads PerformanceServerTiming. Workers also expose a CF-specific cf-ray and a cf-cache-status header on cached responses, so you can correlate Server-Timing with edge cache state.

Field measurement, not lab measurement

Lab numbers from wrangler dev or a single-region curl are nearly useless. TTFB varies by user location, device class, network path, cache state. None of that gets captured by a synthetic check from one machine. Use two field sources together.

The first is Cloudflare Web Analytics, which collects Real User Measurements via a lightweight beacon and reports TTFB by country and POP. The second is the web-vitals library on your client. It reports TTFB to your analytics endpoint using onTTFB. The number it sends matches what Google's CrUX dataset reports to Search Console, so you get one source of truth across the lab/field/SEO boundary. Send both to the same store so you can join on session ID.

Pull the numbers weekly. Set p75 and p95 alerts. Watch for regressions after deploys — a new KV namespace, a route that calls D1 without withSession, or a fetch that silently bypassed cache will all show up as a step change in p75 within a day.

Verify before shipping

The isitready.dev scanner measures TTFB from edge POPs against the targets above. It surfaces cf-cache-status for every checked URL and flags responses missing Server-Timing breakdowns. Run it after every Worker deploy that touches a binding or a fetch path — the regressions described here rarely show up in lab tests but always show up in field data.