Research

The Anatomy of a Page That AI Cites

When ChatGPT or Perplexity recommends a product, it links to specific pages. We analyzed 22,785 cited pages from Sill's monitoring pipeline, spanning 26,257 total citations across 11,405 unique domains. The data reveals which pages get cited, which get ignored, and what separates the two. Most of the distinguishing factors have nothing to do with domain authority.

TL;DR

We analyzed 22,785 cited pages from Sill's monitoring pipeline spanning 26,257 citations across 11,405 domains. 91.5% of pages are cited by only one AI platform. Only 8.5% achieve multi-platform citation, and just 27 pages (0.1%) are cited by all four platforms. Subcategory coverage is a 4.1x citation multiplier. YouTube leads all domains in citation efficiency at 3.4 citations per page. Domain authority shows near-zero or negative correlation with AI citation. The structural traits that predict citation have nothing to do with traditional SEO metrics.

The Dataset: 22,785 Pages Across Four AI Platforms

Sill's monitoring pipeline queries actual chat interfaces daily across ChatGPT, Perplexity, Gemini, and Google AI Overviews. For every page cited in an AI response, we record provenance metadata: which platforms cited it, how frequently, in response to which queries, and across which product subcategories.

The foundational GEO paper (Aggarwal et al., KDD 2024) established the principle that content-level optimizations affect AI citation rates, finding 30-40% improvements from adding statistics. The question our data answers is practical: across 22,785 real-world pages being cited by real AI platforms, what specific patterns emerge?

Metric	Value
Total unique pages cited	22,785
Total citations	26,257
Unique domains	11,405
Brands monitored	141

ChatGPT Dominates Citation Volume. The Edges Reveal More.

Across 22,785 cited pages, ChatGPT is the source of 60.6% of all page-level citations. Perplexity and Google AI Overviews each account for roughly 22%, while Gemini trails at 4.8%. This distribution reflects platform market share (ChatGPT holds roughly 79% of global generative AI web traffic, per Similarweb's GenAI Index), but the absolute numbers mask the more important finding.

Platform	Pages Cited	% of All Pages
ChatGPT	13,806	60.6%
Perplexity	5,167	22.7%
Google AI Overviews	4,961	21.8%
Gemini	1,100	4.8%

The important finding is not which platform cites the most pages. It is which pages get cited by multiple platforms. That is where the real quality signal lives.

Only 8.5% of Pages Get Cited by More Than One Platform

Platform diversity is the number of distinct AI engines that independently cite a given page. In our dataset of 22,785 pages, 91.5% are cited by exactly one platform. Only 1,945 pages (8.5%) achieve citation from two or more platforms. Just 277 pages (1.2%) reach three or more. And only 27 pages out of 22,785 (0.1%) are cited by all four platforms we track.

This matters because each AI platform has different citation DNA. ChatGPT relies on Bing's index and favors Wikipedia. Perplexity runs real-time web searches and favors Reddit. Google AI Overviews pull from organic top-10 results. Gemini favors authoritative lists and local reviews. When a page crosses all of these different retrieval systems, it has demonstrated a quality that transcends any single platform's bias.

Platform Diversity	Pages	% of Total	Interpretation
1 platform	20,840	91.5%	Single-platform retrieval match
2 platforms	1,668	7.3%	Cross-platform authority signal
3 platforms	250	1.1%	Strong universal citation signal
4 platforms	27	0.1%	Elite cross-platform authority

The most common multi-platform combination is ChatGPT + Google AI Overviews + Perplexity (28 pages), followed by the full four-platform set of ChatGPT + Gemini + Google AI Overviews + Perplexity (13 pages). These pages are worth studying because they represent the content that passes every major AI retrieval system's quality filter independently.

If your goal is to understand what makes a page AI-citable, platform diversity is the metric to optimize for. A page cited by four platforms is more informative than a page cited ten times by one platform, because the former has demonstrated universal relevance.

89% of Pages Are Cited Exactly Once

The citation frequency distribution follows a steep power law. Out of 22,785 pages, 20,315 (89.2%) are cited exactly once. Only 2,470 pages (10.8%) are cited two or more times. Just 579 pages (2.5%) reach three or more citations. At the far end, only 10 pages in the entire dataset achieved 11 or more citations.

This distribution has a direct strategic implication. Most pages that get cited by AI are one-time retrievals for a specific query. The pages that get cited repeatedly are the ones that answer multiple related queries. Breadth of relevance within a topic area is what drives repeat citation, and our subcategory data confirms this.

Subcategory Coverage Is a 4x Citation Multiplier

For each cited page, Sill tracks which product subcategories triggered the citation. A page cited for "Contact Center as a Service" only has a subcategory count of 1. A page cited for both "RV Maintenance & Repair Services" and "RV Parts & Accessories" and "Recreational Vehicles" has a subcategory count of 3. The relationship between subcategory coverage and citation frequency is the strongest signal in our data.

Subcategory Coverage	Pages	Avg Citations	Multiplier vs. Baseline
1 subcategory	18,095	1.1	1.0x (baseline)
2 subcategories	181	2.6	2.4x
3 subcategories	18	4.5	4.1x
4 subcategories	3	5.3	4.8x

Pages relevant to 3+ subcategories average 4.5 citations, compared to 1.1 for single-subcategory pages. That is a 4.1x multiplier. The mechanism is straightforward: a comprehensive comparison page that covers multiple product angles provides relevant answer fragments for a wider range of buyer queries. AI retrieval systems retrieve it for "best X for Y," "X vs Z," and "how to choose X" all from the same page.

This is the content strategy takeaway. Narrow pages that answer a single query get cited once and forgotten. Comprehensive pages that span adjacent subcategories become citation magnets. The data shows a near-linear relationship: each additional subcategory a page is relevant to adds roughly one additional citation.

The Domains AI Cites Most

Across our full dataset, Reddit leads in raw page count with 408 cited pages. YouTube ranks fifth by page count (104 pages) but leads in a more important metric: citations per page. YouTube pages average 3.4 citations each, compared to 1.0-1.1 for most other domains. A single YouTube video review generates more total citations than three average Reddit threads.

Domain	Pages Cited	Total Citations	Citations/Page
reddit.com	408	448	1.1
youtube.com	104	352	3.4
en.wikipedia.org	212	221	1.0
linkedin.com	160	170	1.1
rtings.com	85	135	1.6
forbes.com	68	96	1.4
pcgamer.com	59	90	1.5
gartner.com	63	76	1.2
tomshardware.com	38	73	1.9
tomsguide.com	84	94	1.1

YouTube's 3.4x citation efficiency stands out. A YouTube video review is cited across multiple queries and often by multiple platforms (YouTube dominates the multi-platform citation list, with 26 of the 100 highest-diversity pages). The Ahrefs study reported a 0.737 correlation between YouTube presence and AI visibility. Our data confirms this: YouTube pages are cited more frequently, across more platforms, than pages from any other domain.

The data also shows that specialized review sites (rtings.com at 1.6, tomshardware.com at 1.9, pcgamer.com at 1.5) outperform general-purpose platforms in citations per page. Depth of expertise within a category correlates with citation efficiency. Chen et al. (University of Toronto, 2025) found that 69-82% of AI search citations come from earned media. Our domain distribution confirms this: the top cited domains are overwhelmingly third-party sites, not brand-owned properties.

What Does Not Predict AI Citation

The absence of certain traditional SEO signals in highly cited pages is as informative as the presence of the traits above.

Traditional SEO Signal	Correlation with AI Citation	Evidence
Domain authority	Slightly negative (r = -0.12 to -0.18)	SearchAtlas, 21,767 domains
Backlink count	Weak (r = 0.218)	Ahrefs, 75,000 brands
Keyword density	Negative (10% worse than baseline)	Aggarwal et al., KDD 2024
FAQ schema markup	Negative (3.6 vs 4.2 citations/query)	Aggarwal et al., KDD 2024
Branded search volume	Weak	Similarweb GenAI Index

The SearchAtlas study of 21,767 domains measured domain authority correlation with AI visibility at r = -0.12 for ChatGPT and -0.18 for Perplexity. These are not weak correlations. They are slightly negative. Our data is consistent: the top cited domains are a mix of high-authority (Wikipedia, Forbes, Gartner) and moderate-authority (rtings.com, soundguys.com, techbloat.com) sites. Domain authority does not predict which pages get cited. Content relevance and structure do.

What the Top-Cited Pages Have in Common

Synthesizing the patterns across our dataset, the pages with the highest citation frequency and platform diversity share a consistent set of traits. These compound: a page with all of these traits is cited more frequently than a page with any subset.

1Multi-subcategory relevance (4.1x citation multiplier at 3+ subcategories)

Pages spanning multiple product subcategories average 4.5 citations vs. 1.1 for single-subcategory pages. Comprehensive comparison and review content that covers adjacent use cases generates the broadest citation surface area.

2Specific statistics with named sources (+30-40% visibility)

The foundational GEO paper found this to be the single highest-impact content optimization across 10,000 queries. Pages with vague claims ("significant improvement") are cited less than pages with specific numbers ("30-40% improvement, Aggarwal et al. 2024"). Research from Wan et al. (ACL 2024, UC Berkeley) confirms that LLMs favor factual density over stylistic authority signals.

3Video and multimedia format (3.4x citation efficiency on YouTube)

YouTube pages in our dataset average 3.4 citations each, the highest of any domain. YouTube also dominates multi-platform citations, with 26 of the 100 highest-platform-diversity pages being YouTube videos. AI engines retrieve video content for product reviews, comparisons, and tutorials at disproportionate rates.

4Structured comparisons (tables, ranked lists)

Comparison tables provide ready-made answer fragments for AI retrieval. The specialized review sites in our top-cited domains (rtings.com at 1.6 citations/page, tomshardware.com at 1.9) are built around structured product comparisons. Their citation efficiency confirms that extraction-friendly formats outperform narrative content.

5Content freshness within 90 days (+67% citation rate)

Research shows that updating content within a 90-day window increases AI citations by 67%. AI retrieval systems that perform web searches (Perplexity, ChatGPT with browsing, Google AI Overviews) filter for recency. Stale content gets deprioritized regardless of its structural quality.

6Third-party earned media presence (r = 0.664)

The top cited domains in our dataset are overwhelmingly third-party: Reddit, YouTube, Wikipedia, LinkedIn, Forbes, Gartner, PCGamer, Tom's Hardware. Off-site brand mention frequency (r = 0.664) outperforms backlinks (r = 0.218) as a predictor of AI visibility by a factor of three. A brand with no YouTube reviews and no Reddit presence is relying on a fraction of the available citation surface area.

Measuring Your Citation Footprint

Knowing what makes a page citable is useful. Knowing which of your pages are actually being cited, by which platforms, in response to which queries, is actionable.

Sill's monitoring pipeline tracks citation provenance at the page level. For every page cited in an AI response about your brand, we record the citing platforms, the queries that triggered the citation, the citation frequency over time, and the subcategory associations. Across 22,785 pages and 11,405 domains, this creates a dataset that reveals not just whether your brand appears, but which specific pages are doing the work.

The three-layer attribution model we described previously starts with this data. Simulated visibility (daily SOV tracking) tells you whether your overall presence is growing. Citation provenance data tells you which specific pages are driving that growth and which need attention.

The data in this post demonstrates the scale of the challenge. With 89% of pages cited only once and only 8.5% achieving multi-platform visibility, the difference between a page that contributes to AI presence and a page that does not comes down to the structural and topical traits described above. Without page-level citation data, content optimization is directionless. With it, every content investment can be targeted at the specific gaps that matter most.

See which of your pages AI engines actually cite

Sill tracks citation provenance across 22,785+ pages and four AI platforms daily. See your citation footprint, identify structural gaps, and optimize the pages that matter most.

Start Monitoring Talk to Us

References

Sill Internal Data. "Citation Provenance Analysis: 22,785 pages, 26,257 citations, 11,405 domains." Sill Monitoring Pipeline, March 2026.
Aggarwal, P., et al. "GEO: Generative Engine Optimization." KDD 2024, Princeton/Georgia Tech/IIT Delhi. arxiv.org/abs/2311.09735
Wan, Y., et al. "Evidence-based evaluation of LLM persuasion." ACL 2024, UC Berkeley. arxiv.org/abs/2407.13008
Ahrefs. "LLM Brand Visibility Study (75,000 brands)." ahrefs.com
SearchAtlas. "Domain Authority vs. LLM Visibility (21,767 domains)." searchatlas.com
Similarweb. "GenAI Brand Visibility Index 2026." Similarweb Research.
Chen, Z., et al. "AI Search Engines and Earned Media Citations." University of Toronto, 2025.

Get a Demo

Tell us about your brand and we'll be in touch to walk you through Sill.

Back to Blog

Research

The Anatomy of a Page That AI Cites

TL;DR

The Dataset: 22,785 Pages Across Four AI Platforms

Metric	Value
Total unique pages cited	22,785
Total citations	26,257
Unique domains	11,405
Brands monitored	141

ChatGPT Dominates Citation Volume. The Edges Reveal More.

Platform	Pages Cited	% of All Pages
ChatGPT	13,806	60.6%
Perplexity	5,167	22.7%
Google AI Overviews	4,961	21.8%
Gemini	1,100	4.8%

The important finding is not which platform cites the most pages. It is which pages get cited by multiple platforms. That is where the real quality signal lives.

Only 8.5% of Pages Get Cited by More Than One Platform

Platform Diversity	Pages	% of Total	Interpretation
1 platform	20,840	91.5%	Single-platform retrieval match
2 platforms	1,668	7.3%	Cross-platform authority signal
3 platforms	250	1.1%	Strong universal citation signal
4 platforms	27	0.1%	Elite cross-platform authority

89% of Pages Are Cited Exactly Once

Subcategory Coverage Is a 4x Citation Multiplier

Subcategory Coverage	Pages	Avg Citations	Multiplier vs. Baseline
1 subcategory	18,095	1.1	1.0x (baseline)
2 subcategories	181	2.6	2.4x
3 subcategories	18	4.5	4.1x
4 subcategories	3	5.3	4.8x

The Domains AI Cites Most

Domain	Pages Cited	Total Citations	Citations/Page
reddit.com	408	448	1.1
youtube.com	104	352	3.4
en.wikipedia.org	212	221	1.0
linkedin.com	160	170	1.1
rtings.com	85	135	1.6
forbes.com	68	96	1.4
pcgamer.com	59	90	1.5
gartner.com	63	76	1.2
tomshardware.com	38	73	1.9
tomsguide.com	84	94	1.1

What Does Not Predict AI Citation

The absence of certain traditional SEO signals in highly cited pages is as informative as the presence of the traits above.

Traditional SEO Signal	Correlation with AI Citation	Evidence
Domain authority	Slightly negative (r = -0.12 to -0.18)	SearchAtlas, 21,767 domains
Backlink count	Weak (r = 0.218)	Ahrefs, 75,000 brands
Keyword density	Negative (10% worse than baseline)	Aggarwal et al., KDD 2024
FAQ schema markup	Negative (3.6 vs 4.2 citations/query)	Aggarwal et al., KDD 2024
Branded search volume	Weak	Similarweb GenAI Index

What the Top-Cited Pages Have in Common

1Multi-subcategory relevance (4.1x citation multiplier at 3+ subcategories)

2Specific statistics with named sources (+30-40% visibility)

3Video and multimedia format (3.4x citation efficiency on YouTube)

4Structured comparisons (tables, ranked lists)

5Content freshness within 90 days (+67% citation rate)

6Third-party earned media presence (r = 0.664)

Measuring Your Citation Footprint

Knowing what makes a page citable is useful. Knowing which of your pages are actually being cited, by which platforms, in response to which queries, is actionable.

See which of your pages AI engines actually cite

Sill tracks citation provenance across 22,785+ pages and four AI platforms daily. See your citation footprint, identify structural gaps, and optimize the pages that matter most.

Start Monitoring Talk to Us

References

Sill Internal Data. "Citation Provenance Analysis: 22,785 pages, 26,257 citations, 11,405 domains." Sill Monitoring Pipeline, March 2026.
Aggarwal, P., et al. "GEO: Generative Engine Optimization." KDD 2024, Princeton/Georgia Tech/IIT Delhi. arxiv.org/abs/2311.09735
Wan, Y., et al. "Evidence-based evaluation of LLM persuasion." ACL 2024, UC Berkeley. arxiv.org/abs/2407.13008
Ahrefs. "LLM Brand Visibility Study (75,000 brands)." ahrefs.com
SearchAtlas. "Domain Authority vs. LLM Visibility (21,767 domains)." searchatlas.com
Similarweb. "GenAI Brand Visibility Index 2026." Similarweb Research.
Chen, Z., et al. "AI Search Engines and Earned Media Citations." University of Toronto, 2025.

Get a Demo

Tell us about your brand and we'll be in touch to walk you through Sill.

The Anatomy of a Page That AI Cites

The Dataset: 22,785 Pages Across Four AI Platforms

ChatGPT Dominates Citation Volume. The Edges Reveal More.

Only 8.5% of Pages Get Cited by More Than One Platform

89% of Pages Are Cited Exactly Once

Subcategory Coverage Is a 4x Citation Multiplier

The Domains AI Cites Most

What Does Not Predict AI Citation

What the Top-Cited Pages Have in Common

Measuring Your Citation Footprint

See which of your pages AI engines actually cite

References

Get a Demo

What Is LLM Visibility? Definition, Measurement, and 2026 Benchmarks

Brand Sentiment Intelligence for Small Businesses: What AI Engines Actually Think About You

How AI Models Form Brand Opinions: The Content-to-Recommendation Pipeline

The Anatomy of a Page That AI Cites

The Dataset: 22,785 Pages Across Four AI Platforms

ChatGPT Dominates Citation Volume. The Edges Reveal More.

Only 8.5% of Pages Get Cited by More Than One Platform

89% of Pages Are Cited Exactly Once

Subcategory Coverage Is a 4x Citation Multiplier

The Domains AI Cites Most

What Does Not Predict AI Citation

What the Top-Cited Pages Have in Common

Measuring Your Citation Footprint

See which of your pages AI engines actually cite

References

Get a Demo