Skip to main content
Back to Blog
Research

12 GEO Tactics Ranked by Scientific Evidence

Generative Engine Optimization has a research problem. There are dozens of tactics circulating in blog posts, conference talks, and agency pitch decks. Most are anecdotal. A few have been tested in peer-reviewed studies with specific effect sizes. We synthesized 10 academic papers, 15 large-scale industry studies covering 680 million citations, and our own content audit data to rank the 12 tactics with the strongest empirical backing. Each tactic includes the measured effect, the source, and how widely it is actually adopted.

TL;DR

We ranked the 12 strongest Generative Engine Optimization tactics by measured effect size, drawing on 10 academic papers, 15 industry studies covering 680M citations, and our own content audit data. The top three: branded web mentions (r = 0.664), YouTube presence (r = 0.737, 3.4x citation efficiency), and statistics addition (+93% citations, +30-40% visibility). The largest adoption gaps are answer capsules (0% of pages), schema markup (0%), and comparison tables (1%), all on-site changes with strong evidence and near-zero implementation. Keyword stuffing, FAQ schema, and llms.txt files are confirmed ineffective or harmful.

Why Rankings Matter More Than Lists

Most GEO advice treats all tactics as equally valid. Add statistics. Get on Reddit. Update your content. Optimize for Bing. These recommendations appear side by side with no indication of which ones have been tested rigorously and which are educated guesses.

The difference matters. The foundational GEO paper (Aggarwal et al., KDD 2024) tested nine optimization methods across 10,000 queries in controlled experiments. Three methods delivered 30-40% relative visibility improvement. One (keyword stuffing) performed 10% worse than baseline. Without rankings, a marketing team could spend months on the tactic that actively hurts them.

We classify tactics into three evidence tiers. Tier 1 tactics have been validated in multiple independent studies with specific, reproducible effect sizes. Tier 2 tactics have single-study evidence or strong practitioner consensus. Tier 3 tactics have theoretical support but limited testing. This post covers the 12 strongest Tier 1 and Tier 2 tactics. The full codex of 47 tactics, including what definitively does not work, is the basis for Sill's content audit scoring.

The Evidence Base

The rankings draw on two types of evidence: published research and our own proprietary data. The published research spans 680 million citations across studies from Princeton, CMU, UC Berkeley, Harvard, Columbia, MIT, and the University of Toronto. Our own data comes from content audits and thousands of AI platform queries monitored through Sill's pipeline.

SourceStudy TypeScale
Aggarwal et al., KDD 2024RCT (9 methods)10,000 queries
Wu et al., CMU 2025AutoGEO framework35.99% avg improvement
Wan et al., ACL 2024, UC BerkeleyPerturbation studyLLM selection rates
SE RankingObservational129,000 domains
AhrefsCorrelation analysis75,000 brands
ProfoundCitation analysis680M citations
Sill proprietary dataContent audits + monitoringObservational study

The 12 Tactics, Ranked

Rankings weight three factors: measured effect size, number of independent studies confirming the effect, and consistency across AI platforms. Off-site tactics and on-site tactics are ranked together because AI engines evaluate them together. A page with perfect structure but no off-site brand presence will underperform a mediocre page from a well-mentioned brand.

RankTacticMeasured EffectTypeTier
1Branded web mentionsr = 0.664 (strongest predictor)Off-site1
2YouTube presencer = 0.737; 3.4x citation efficiencyOff-site1
3Statistics and quantitative data+93% citations; +30-40% visibilityOn-site1
4Answer capsules87% of cited posts had capsulesOn-site1
5Expert quotes and source citations+70% citations; +30-40% visibilityOn-site1
6Content freshness (90-day updates)+67% citationsOn-site1
7Reddit mentions46.7% of Perplexity top-10 shareOff-site1
8Wikipedia page47.9% of ChatGPT top-10 citationsOff-site1
9Comparison tables+47% citation rate; 96% extraction accuracyOn-site1
10Review platform listings3x ChatGPT citation; 47.6% B2B shareOff-site1
11Major publication coverage4.2x sustained citation at 5+ mentionsOff-site1
12Long-form content (2,900+ words)+59% citationsOn-site1

1. Branded Web Mentions (r = 0.664)

The single strongest predictor of whether an AI engine recommends your brand is how often your brand is mentioned across the web. The Ahrefs study of 75,000 brands found a correlation of 0.664 between branded web mentions and AI visibility. That is three times stronger than backlinks (r = 0.218) and dramatically stronger than domain authority, which shows near-zero or slightly negative correlation.

Top-quartile brands by mention frequency receive 10x more AI citations than bottom-quartile brands. Chen et al. (University of Toronto, 2025) confirmed the mechanism: 69-82% of AI citations come from earned media, compared to Google's 40-45%. AI engines systematically prefer third-party mentions over brand-owned content.

This is the most important finding in GEO research and the least actionable in the short term. Building brand mentions requires PR, partnerships, community engagement, and content that earns coverage. It cannot be done in a sprint. Brands that start now will compound this advantage over the next 12-24 months.

2. YouTube Presence (r = 0.737, 3.4x Citation Efficiency)

YouTube has the highest correlation with AI visibility of any single factor (r = 0.737, Ahrefs). In our analysis of 22,785 cited pages, YouTube pages average 3.4 citations each, the highest of any domain. YouTube is also the most cited domain in Google AI Overviews at 29.5% share (BrightEdge).

YouTube citations grew 25% since January 2025. As of January 2026, YouTube content appears in 16% of all LLM answers, compared to 10% for Reddit (Profound, 680M citations). Video reviews and tutorials have the highest citation rates within the platform. YouTube also dominates multi-platform citations: 26 of the 100 highest-platform-diversity pages in our dataset are YouTube videos.

The mechanism is straightforward. Video content is transcribed, indexed, and treated as authoritative user-generated review content by AI retrieval systems. A single detailed product review video generates citations across multiple query types: "best X," "X vs Y," "X review," and "how to choose X."

3. Statistics and Quantitative Data (+93% Citations)

Adding specific statistics to content is the highest-impact on-site optimization tested in controlled experiments. The KDD 2024 paper found a 30-40% relative visibility improvement from statistics addition across 10,000 queries. SE Ranking's observational study of 129,000 domains found that pages with 19 or more data points averaged 5.4 citations, compared to 2.8 for pages with fewer. That is a 93% increase.

Wan et al. (ACL 2024, UC Berkeley) confirmed the mechanism. LLMs largely ignore stylistic authority signals like scientific tone or appeals to expertise. They respond to textual relevance and factual density. A page with specific numbers and named sources provides higher information density per token, which makes it more useful as a retrieval result.

MetricPublished ResearchSill Audit Data
Pages with statisticsAvg 5.4 citations (SE Ranking)80.2% of audited pages have some stats
19+ data points targetThreshold for top-tier citationOnly 3.9% of pages meet this target
Visibility boost+30-40% (KDD 2024 RCT)Mean audit score: 30.4/100

The gap between awareness and implementation is striking. While 80.2% of the pages we audited include some statistical content, only 3.9% reach the 19+ data point threshold associated with top-tier citation performance. Most pages include a few numbers. Few pages are genuinely data-dense. This is one of the largest untapped opportunities in GEO.

4. Answer Capsules (87% of Cited Posts)

An answer capsule is a 120-150 character self-contained answer placed immediately after a question-framed H2 heading. Search Engine Land's analysis found this to be the number one predictor of ChatGPT citation: 87% of cited posts had either answer capsules or proprietary data.

The mechanism maps directly to how RAG (retrieval-augmented generation) systems work. When an AI engine retrieves a page, it needs to extract a concise answer fragment. Answer capsules are pre-extracted answers. They reduce the work the AI model needs to do to generate a useful response, which increases the probability of citation.

Our audit data shows a significant adoption gap: 0% of the pages we audited use structured answer capsules. This is the single largest gap between evidence strength and real-world implementation. Every on-site content optimization effort should start here.

5. Expert Quotes and Source Citations (+70% Citations)

The KDD 2024 paper found that citing authoritative sources delivered a 30-40% visibility improvement, comparable to statistics addition. SE Ranking's observational study found a 70% increase in citations for pages with expert quotes compared to those without (4.1 vs 2.4 citations per query).

Source citations serve a dual purpose. They increase the factual density that LLMs respond to (Wan et al., ACL 2024), and they create a citation chain that AI retrieval systems can verify. A page that cites a specific study is more useful as a retrieval result than a page that makes the same claim without attribution.

In our audits, 57.5% of pages include some form of expert quotes or source citations. The adoption rate is higher than answer capsules but still leaves substantial room for improvement, particularly in the depth and specificity of citations used.

6. Content Freshness: The 90-Day Window (+67% Citations)

Content updated within 90 days receives 67% more AI citations than stale content (SE Ranking, Seer Interactive, AirOps). Pages not updated quarterly are 3x more likely to lose citations over time. Perplexity is the most freshness-sensitive platform: 50% of its citations come from content published in the current year (Profound, 680M citations). ChatGPT similarly skews recent: 60.5% of its top-cited pages were published within two years.

Only 15.4% of the pages we audited showed evidence of recent updates. This means 84.6% of brand content is losing citation potential through staleness alone. A quarterly content refresh schedule, even modest updates adding recent data or removing outdated claims, is one of the highest-ROI GEO maintenance activities.

7. Reddit Mentions (46.7% of Perplexity Top-10)

Reddit holds 46.7% of Perplexity's top-10 citation share, making it the single most influential source for that platform (Profound). Across all LLM responses, Reddit appears in 40.1% (Statista/Semrush). Google has a $60M/year licensing deal with Reddit; OpenAI has a similar ~$70M/year agreement. Reddit content with 3+ upvotes enters Tier 2 training data sources.

The tactic requires genuine community participation. Promotional posts are flagged and removed. Two to three months of authentic engagement in relevant subreddits is the minimum investment before brand mentions carry weight. But the payoff is significant: Reddit is the dominant citation source for the AI platform (Perplexity) that skews most heavily toward purchase-intent queries.

8. Wikipedia Page (47.9% of ChatGPT Top-10)

Wikipedia holds 47.9% of ChatGPT's top-10 citation share (Profound). Approximately 22% of LLM training data comes from Wikipedia (Quoleady). Among ChatGPT-mentioned tools, 78.8% have a Wikipedia page. Wikipedia is effectively a non-negotiable prerequisite for ChatGPT visibility.

Wikipedia notability requirements are strict: companies need significant coverage in independent, reliable sources. This creates a compounding effect with tactic #1 (branded web mentions) and tactic #11 (major publication coverage). The brands that earn enough press coverage to qualify for a Wikipedia page are the same brands that accumulate the off-site mentions that drive AI visibility across all platforms. Wikipedia is both a direct citation source and an indicator of the broader earned media presence that AI engines reward.

9. Comparison Tables (+47% Citation Rate)

Properly structured HTML comparison tables increase AI citation rates by 47% (Search Engine Land). AI extraction accuracy from well-formatted tables reaches 96% (Am I Cited). The specialized review sites that dominate our citation data (rtings.com at 1.6 citations/page, tomshardware.com at 1.9) are built around structured product comparisons.

Only 1% of the pages we audited include structured comparison content. The requirements are specific: proper HTML tables with thead elements, descriptive column headers, and consistent data formatting. Tables embedded as images or built with CSS grids are not extracted at the same rate. This is a structural implementation detail that has measurable impact.

10. Review Platform Listings (3x ChatGPT Citation)

Brands listed on review platforms (G2, Capterra, GetApp, Clutch) are 3x more likely to be cited by ChatGPT (SE Ranking). The dominance is vertical-specific: GetApp captures 47.6% of B2B software citations in ChatGPT (Hall, 456K citations). Clutch holds 72-84.5% for agencies. TripAdvisor holds 72.9-94.2% for travel.

An important nuance: not all review platforms are indexed equally by AI. Yelp and Trustpilot block AI crawlers. GetApp, Clutch, and SourceForge allow full access. 100% of ChatGPT-mentioned tools have Capterra reviews and 99% have G2 reviews (Quoleady). For B2B brands, review platform presence is table stakes.

11. Major Publication Coverage (4.2x Sustained Citation)

Editorial media accounts for 61% of AI reputation responses (We Are Bottle). That rises to 72% for value perception queries, the kind buyers ask when evaluating vendors. Brands with 5 or more mentions in major publications within a 6-month window achieve 4.2x higher sustained citation rates compared to those with fewer (Profound).

The threshold matters. Sporadic press coverage produces minimal effect. Consistent coverage over time, hitting the 5-mention threshold, creates a compounding citation advantage. This aligns with Chen et al.'s finding that AI engines exhibit systematic bias toward earned media. The brands that invest in sustained PR rather than one-off announcements are the ones AI engines learn to recommend.

12. Long-Form Content, 2,900+ Words (+59% Citations)

Pages with 2,900 or more words average 5.1 citations compared to 3.2 for shorter pages, a 59% increase (SE Ranking, 129K domains). Optimal section length is 120-180 words per heading, which produces 4.6 vs 2.7 citations (a 70% increase).

Only 31.8% of the pages we audited meet the 2,900+ word count target. Only 15.7% structure their content with the 120-180 word sections that maximize per-heading citation probability. Long-form content works because it increases the number of unique answer fragments a page can provide. Each well-structured section is a potential retrieval result for a different query. Length alone is not the goal. Structured depth with specific, extractable sections is the goal.

The Adoption Gap

We scored pages across hundreds of brands against these 12 tactics. The mean content audit score was 30.4 out of 100 (median 29.2). Zero pages scored above 80. The distribution is concentrated between 20-40, with 49.2% of all audited pages falling in that range.

The gap between what the research shows works and what brands actually implement is the largest competitive opportunity in GEO.

TacticAdoption RateGap Assessment
Statistics (any)80.2%High adoption, but only 3.9% meet the 19+ data point threshold
Author bios / E-E-A-T signals79.3%Well adopted, but credential depth varies
Expert quotes and citations57.5%Moderate adoption, specificity is often lacking
Long-form (2,900+ words)31.8%Two-thirds of pages are too thin
120-180 word sections15.7%Most pages have irregular section lengths
Fresh content (updated recently)15.4%84.6% of content is stale
Comparison content1.0%Near-zero adoption despite +47% citation rate
Answer capsules0%Zero adoption despite being #1 citation predictor
Schema markup0%Zero adoption despite 81% presence on cited pages

The three tactics with the widest adoption gap, answer capsules (0%), schema markup (0%), and comparison tables (1%), are all on-site structural changes that a content team can implement without waiting for external coverage. They are the fastest path from current state to measurably better AI citation performance.

What Definitively Does Not Work

An evidence-based ranking would be incomplete without the negative results. These tactics have been tested and found to be ineffective or actively harmful.

TacticMeasured EffectSource
Keyword stuffing10% worse than baselineAggarwal et al., KDD 2024
FAQ schema markup3.6 vs 4.2 citations (negative)SE Ranking, 129K domains
llms.txt filesZero measurable impactSE Ranking, 300K domains
Domain authority aloner = -0.12 to -0.18 (negative)SearchAtlas, 21,767 domains
Backlinks (direct effect)r = 0.218 (weak)Ahrefs, 75,000 brands

The keyword stuffing result is particularly important because it is the most intuitive tactic for teams trained in traditional SEO. The exact search query appears in only approximately 5% of AI answers. AI engines reformulate queries, retrieve by semantic similarity, and generate novel responses. Optimizing for exact-match keywords is counterproductive.

The FAQ schema result may be the most counterintuitive. Q&A format content helps (it is a Tier 2 tactic). But FAQ schema markup specifically hurts citation rates. The likely explanation is that FAQ schema triggers featured snippet treatment in traditional search, which can reduce the content available for AI extraction. Content structure and markup structure are separate levers, and they do not always pull in the same direction.

Platform-Specific Priorities

The 12 tactics above apply universally, but their relative weight shifts by platform. Only 11% of domains are cited by both ChatGPT and Perplexity (Profound). Each platform has distinct citation DNA. Wu et al. (CMU, 2025) found that engine-specific optimization rules consistently outperform generic strategies.

PlatformTop Citation SourcesPriority Tactics
ChatGPTWikipedia (47.9%), Forbes, G2Wikipedia, review platforms, branded mentions
PerplexityReddit (46.7%), YouTube, GartnerReddit presence, content freshness, YouTube
Google AI OverviewsReddit (21%), YouTube (18.8%), QuoraOrganic rankings, YouTube, answer capsules
GeminiAuthoritative lists (49%), Google authorityList inclusions, local reviews, structured data
ClaudeDatabases/directories (68%), awardsDirectory presence, awards, longevity signals

The practical implication: if your audience primarily uses Perplexity, investing in Reddit presence and content freshness will produce faster results than optimizing Wikipedia. If ChatGPT is the dominant platform in your vertical, Wikipedia and review platform listings are non-negotiable. Monitoring platform-specific visibility is the prerequisite for platform-specific optimization.

Where to Start: A Priority Framework

The 12 tactics fall into two categories by implementation timeline. On-site tactics (3, 4, 5, 6, 9, 12) can be implemented within weeks. Off-site tactics (1, 2, 7, 8, 10, 11) require months of sustained effort.

For on-site, start with the widest adoption gaps: add answer capsules (0% adoption), implement schema markup (0% adoption), and add comparison tables (1% adoption). These three changes have strong evidence backing and near-zero current implementation. They represent the highest-impact, lowest-competition optimizations available.

For off-site, prioritize the platform where your audience is. Check your AI visibility across platforms to identify where you are weakest. If you have zero Reddit presence and your buyers use Perplexity, that is your highest-ROI investment. If you lack review platform listings and your category is dominated by B2B queries, G2 and Capterra listings should come first. The data tells you where to start. The rankings tell you what to do when you get there.

See which GEO tactics your pages are missing

Sill audits your content against these 12 tactics, scores each page, and prioritizes the changes with the highest evidence-backed impact on AI visibility.

References

  1. Aggarwal, P., et al. "GEO: Generative Engine Optimization." KDD 2024, Princeton/Georgia Tech/IIT Delhi. arxiv.org/abs/2311.09735
  2. Wu, Y., et al. "AutoGEO: Automated Generative Engine Optimization." CMU, 2025.
  3. Wan, Y., et al. "Evidence-based evaluation of LLM persuasion and factual density." ACL 2024, UC Berkeley. arxiv.org/abs/2407.13008
  4. Chen, Z., et al. "AI Search Engines and Earned Media Citations (69-82%)." University of Toronto, 2025.
  5. Ahrefs. "LLM Brand Visibility Study (75,000 brands, mention/backlink correlations)." ahrefs.com
  6. SE Ranking. "AI Citation Analysis (129,000 domains)."
  7. SearchAtlas. "Domain Authority vs. LLM Visibility (21,767 domains)." searchatlas.com
  8. Profound. "AI Citation Analysis (680M citations across platforms)."
  9. Sill Internal Data. "Content Audit Analysis." Sill Monitoring Pipeline, March 2026.
  10. Kumar, A. & Lakkaraju, H. "Strategic text sequences in product descriptions." Harvard, 2024.
  11. Kamruzzaman, M., et al. "LLM bias toward global brands and established companies." EMNLP 2024.
  12. Search Engine Land. "ChatGPT Citation Predictors (answer capsules, original research)." November 2025.

Get Your Report

Request your first analysis today to see where you stand.

Daniel Wang

Founder · UC Berkeley MIDS

Previously at Nordstrom, Bloomberg, Hexagon (now Octave)

Related reading