Google rebranded its Search Generative Experience to AI Overviews on May 14, 2024 at Google I/O, and rolled the feature out to hundreds of millions of US searchers that same week (Google blog announcement, May 2024). Eighteen months in, one finding from independent citation studies keeps showing up: the sites cited in an AI Overview are often not the sites ranked in the organic top 10 for the same query. BrightEdge's 16-month tracking puts overall overlap at 54.5% as of September 2025, but the top-10 overlap specifically has been hovering around 17%. That means roughly 5 out of 6 AI Overview citations come from pages that do not sit on page one of classic results (BrightEdge, 2025).

That gap is the entire story. Classic SEO optimizes for one signal: where you rank. AI Overviews optimize for a different one: whether your prose is usable as a citation. Both matter. They are not the same job.

The pipeline, as far as Google has documented it

Google has not published the AI Overviews architecture in detail. What the company has confirmed, paired with the broader retrieval-augmented generation literature, gives a five-stage shape:

Query expansion (fan-out). A single user query is decomposed into related sub-queries covering adjacent topics, alternative phrasings, and entities likely tied to the underlying intent.
Retrieval. Each sub-query hits Google's index. Candidate pages are pulled from the union of those result sets, not just the top 10 of the original query.
Reranking. A semantic reranker scores candidates for relevance to the synthesized intent. This is where pages outside the organic top 10 routinely beat pages inside it.
Generation. A custom Gemini model writes the answer using the reranked context as its prompt input.
Grounding. The model attaches citations to claims it can tie back to a source passage. Pages that cannot be cleanly extracted often get used as background but never linked.

The fan-out step is the one to internalize. SE Ranking found that the Gemini 3 upgrade replaced about 42% of previously cited domains and returned roughly 32% more source URLs per response (SE Ranking, 2025). When the source pool widens, organic ranking matters less and citation fitness matters more.

Classic ranking vs AI Overview citation — how the signals differ

Signal	Classic ranking weight	AI Overview citation weight
Backlink authority (PageRank-style)	High	Moderate, indirect
Top-10 organic position	Defines visibility	Loose correlation only — ~17% top-10 overlap (BrightEdge 2025)
Entity disambiguation (`Organization`, `sameAs`)	Helpful	High — the model needs to know who you are
Schema for content type (`Article`, `FAQPage`, `HowTo`)	Helpful for rich results	High — gives the reranker a typed handle
`dateModified` freshness	Moderate	High for queries with temporal intent
Snippet extractability (clean H2 + 1-sentence answer)	Optional	Required — hard to cite a 4-paragraph buildup
Information density (claims per paragraph)	Indirect	High — sparse prose loses to dense prose
`nosnippet` / `max-snippet:0`	Removes snippet	Removes citation eligibility entirely

That last row is the one most teams miss. Google's official AI Features documentation states: to be eligible as a supporting link in AI Overviews or AI Mode, a page must be indexed and eligible to be shown in Search with a snippet (Google Search Central, 2026). Set nosnippet, set max-snippet:0, or use data-nosnippet on the relevant block, and the page exits the citation pool. The same directive that hides featured snippets hides AI Overview links.

What predicts being cited

Across the BrightEdge, SE Ranking, and Authoritas datasets, the same patterns recur for pages that punch above their organic rank:

Entity clarity in JSON-LD. Organization schema with sameAs pointing to LinkedIn, Crunchbase, GitHub, and Wikidata gives the model a confident handle on who's making the claim.
Typed content blocks. FAQPage and HowTo schema package the Q-and-A or step-by-step structure the generator needs. The reranker sees a typed answer, not a paragraph it has to chunk.
Answer-first prose. The first sentence under each H2 should resolve the question that H2 implies. Buried answers do not get extracted.
Fact density. Pages cited by AI Overviews tend to pack named entities, dates, percentages, and version numbers into short passages. Sparse, throat-clearing prose ranks but does not get quoted.
Freshness signals. Article schema's dateModified should be updated on substantive revisions, formatted in ISO 8601 with timezone, and consistent with the visible on-page date. Google cross-references schema dates against visible dates and sitemap lastmod; conflicts cause the schema date to be ignored (Google News best practices). Bumping dateModified for typo fixes is against Google's News guidelines.

The honest caveat: Google has not confirmed the weighting on any of these. The list is inferred from independent citation tracking, not from a Google whitepaper. Treat it as the best read of the evidence in early 2026, not as a leaked spec.

What the divergence looks like by vertical

Authoritas measured a 79% drop in CTR for the top organic link when an AI Overview appears, and Seer Interactive saw informational-query CTR fall from 1.76% to 0.61% between June 2024 and September 2025. Brands cited inside the AI Overview saw 35% higher organic CTR than uncited peers (Search Engine Land, 2025). The asymmetry is brutal.

Vertical divergence varies widely. BrightEdge's 16-month data shows finance with only 11% top-10 overlap, ecommerce around the same, healthcare and education at 68 to 75%. The lower the overlap, the more your AI visibility is decoupled from your rank. A finance site cannot reach the citation pool by climbing one position; it has to become extractable.

What to actually change

Five concrete moves, ordered by leverage:

Audit Organization and Article JSON-LD on every template. Confirm sameAs is current. Confirm dateModified is wired to real revisions, not the build timestamp.
Rewrite the first sentence under every H2 as a standalone answer. If a reranker pulled only that sentence, would it still resolve the heading?
Add FAQPage schema to any page with Q-and-A. Use the verbatim phrasing users search for, not your internal product vocabulary.
Check robots meta and HTTP headers for nosnippet or max-snippet:0. These are silent disqualifiers. If a page has them by accident, common on staging templates leaked to prod, you are not in the citation pool at all.
Densify thin sections. Every paragraph should carry a verifiable fact, a named entity, a date, or a number. Padding ranks. It does not get cited.

None of these replace SEO. They sit on top. The work compounds because the same Organization block, clean H2 structure, and accurate dateModified improve featured-snippet eligibility and AEO performance at the same time.

Verify before shipping

isitready.dev audits the citation-eligibility surface explicitly: it checks that your nosnippet and max-snippet directives are not silently disqualifying pages, validates Organization, Article, and FAQPage JSON-LD against schema.org requirements, flags missing sameAs and stale dateModified values, and confirms your AI crawler allowlist permits Google-Extended, GPTBot, and ClaudeBot. Run it against your canonical origin before your next content push.