Generative Engine Optimization has a research problem. There are dozens of tactics circulating in blog posts, conference talks, and agency pitch decks. Most are anecdotal. A few have been tested in peer-reviewed studies with specific effect sizes. We synthesized 10 academic papers, 15 large-scale industry studies covering 680 million citations, Otterly.AI's March 2026 controlled experiments, and our own content audit data across 139 brands to rank the 12 tactics with the strongest empirical backing. Each tactic includes the measured effect, the source, and how widely it is actually adopted. For the broader category definition these tactics optimize for, see what is LLM visibility. For implementation details and engine-specific playbooks, see our companion post on GEO tactics in practice.
TL;DR
We ranked the 12 strongest Generative Engine Optimization tactics by measured effect size, drawing on 10 academic papers, 15 industry studies covering 680M citations, Otterly.AI's March 2026 controlled experiments, and our own content audit data across 139 brands. The top three: branded web mentions (r = 0.664), YouTube presence (r = 0.737, 3.4x citation efficiency), and statistics addition (+93% citations, +30-40% visibility). The largest adoption gaps are answer capsules (0% of pages), schema markup (0%), and comparison tables (1%), all on-site changes with strong evidence and near-zero implementation. Otterly's experiments confirmed listicle inclusion and footer text repetition as highly effective, while keyword stuffing, FAQ schema, and llms.txt files are confirmed ineffective.
The key tactics involved in GEO are entity signals (branded mentions, YouTube, Wikipedia), content patterns (answer capsules, statistics, quotes, tables), and freshness within 90 days.
Generative Engine Optimization splits into three tactic families ranked by evidence strength. Entity signals are off-site and work at the brand level: branded web mentions correlate with AI visibility at 0.664, YouTube presence at 0.737, and Wikipedia coverage appears in 47.9% of ChatGPT's top-10 citations (Ahrefs 75K brands; Profound). Content patterns are on-site and work at the page level: answer capsules appear in 87% of ChatGPT-cited posts, statistics addition boosts visibility 30-40% (Aggarwal et al., KDD 2024), expert quotes earn 70% more citations (SE Ranking), and comparison tables are extracted at 96% accuracy.
Freshness is the temporal lever: pages updated within 90 days earn 67% more citations. Keyword stuffing, FAQ schema, and llms.txt files are confirmed ineffective and sometimes counterproductive. The 12 highest-evidence tactics are ranked in the sections below with their measured effect sizes, sources, and current adoption rates.
The KDD 2024 GEO paper found three tactics deliver 30-40% visibility improvement while keyword stuffing performs 10% worse than baseline.
Most GEO advice treats all tactics as equally valid. Add statistics. Get on Reddit. Update your content. Optimize for Bing. These recommendations appear side by side with no indication of which ones have been tested rigorously and which are educated guesses.
The difference matters. The foundational GEO paper (Aggarwal et al., KDD 2024) tested nine optimization methods across 10,000 queries in controlled experiments. Three methods delivered 30-40% relative visibility improvement. One (keyword stuffing) performed 10% worse than baseline. Without rankings, a marketing team could spend months on the tactic that actively hurts them.
We classify tactics into three evidence tiers. Tier 1 tactics have been validated in multiple independent studies with specific, reproducible effect sizes. Tier 2 tactics have single-study evidence or strong practitioner consensus. Tier 3 tactics have theoretical support but limited testing. This post covers the 12 strongest Tier 1 and Tier 2 tactics. The full codex of 47 tactics, including what definitively does not work, is the basis for Sill's content audit scoring.
Rankings draw on 10 academic papers, 15 industry studies covering 680 million citations, and Sill's content audit data across 139 brands.
The rankings draw on two types of evidence: published research and our own proprietary data. The published research spans 680 million citations across studies from Princeton, CMU, UC Berkeley, Harvard, Columbia, MIT, and the University of Toronto. Our own data comes from content audits and thousands of AI platform queries monitored through Sill's pipeline.
| Source | Study Type | Scale |
|---|---|---|
| Aggarwal et al., KDD 2024 | RCT (9 methods) | 10,000 queries |
| Wu et al., CMU 2025 | AutoGEO framework | 35.99% avg improvement |
| Wan et al., ACL 2024, UC Berkeley | Perturbation study | LLM selection rates |
| SE Ranking | Observational | 129,000 domains |
| Ahrefs | Correlation analysis | 75,000 brands |
| Profound | Citation analysis | 680M citations |
| Otterly.AI | Controlled experiments | 8 tactics tested (March 2026) |
| Sill proprietary data | Content audits + monitoring | 139 brands, 86 industries |
Top three by effect size: branded web mentions (r=0.664), YouTube presence (r=0.737), and statistics addition (+93% citations, +30-40% visibility).
Rankings weight three factors: measured effect size, number of independent studies confirming the effect, and consistency across AI platforms. Off-site tactics and on-site tactics are ranked together because AI engines evaluate them together. A page with perfect structure but no off-site brand presence will underperform a mediocre page from a well-mentioned brand.
| Rank | Tactic | Measured Effect | Type | Tier |
|---|---|---|---|---|
| 1 | Branded web mentions | r = 0.664 (strongest predictor) | Off-site | 1 |
| 2 | YouTube presence | r = 0.737; 3.4x citation efficiency | Off-site | 1 |
| 3 | Statistics and quantitative data | +93% citations; +30-40% visibility | On-site | 1 |
| 4 | Answer capsules | 87% of cited posts had capsules | On-site | 1 |
| 5 | Expert quotes and source citations | +70% citations; +30-40% visibility | On-site | 1 |
| 6 | Content freshness (90-day updates) | +67% citations | On-site | 1 |
| 7 | Reddit mentions | 46.7% of Perplexity top-10 share | Off-site | 1 |
| 8 | Wikipedia page | 47.9% of ChatGPT top-10 citations | Off-site | 1 |
| 9 | Comparison tables | +47% citation rate; 96% extraction accuracy | On-site | 1 |
| 10 | Review platform listings | 3x ChatGPT citation; 47.6% B2B share | Off-site | 1 |
| 11 | Major publication coverage | 4.2x sustained citation at 5+ mentions | Off-site | 1 |
| 12 | Long-form content (2,900+ words) | +59% citations | On-site | 1 |
Branded web mentions show a 0.664 correlation with AI visibility, three times stronger than backlinks (0.218), per Ahrefs' study of 75,000 brands.
The single strongest predictor of whether an AI engine recommends your brand is how often your brand is mentioned across the web. The Ahrefs study of 75,000 brands found a correlation of 0.664 between branded web mentions and AI visibility. That is three times stronger than backlinks (r = 0.218) and dramatically stronger than domain authority, which shows near-zero or slightly negative correlation.
Top-quartile brands by mention frequency receive 10x more AI citations than bottom-quartile brands. Chen et al. (University of Toronto, 2025) confirmed the mechanism: 69-82% of AI citations come from earned media, compared to Google's 40-45%. AI engines systematically prefer third-party mentions over brand-owned content.
This is the most important finding in GEO research and the least actionable in the short term. Building brand mentions requires PR, partnerships, community engagement, and content that earns coverage. It cannot be done in a sprint. Brands that start now will compound this advantage over the next 12-24 months.
YouTube shows the strongest AI visibility correlation (r=0.737) and is the #1 cited domain in Google AI Overviews at 29.5% share.
YouTube has the highest correlation with AI visibility of any single factor (r = 0.737, Ahrefs). In our analysis of 22,785 cited pages, YouTube pages average 3.4 citations each, the highest of any domain. YouTube is also the most cited domain in Google AI Overviews at 29.5% share (BrightEdge).
YouTube citations grew 25% since January 2025. As of January 2026, YouTube content appears in 16% of all LLM answers, compared to 10% for Reddit (Profound, 680M citations). Video reviews and tutorials have the highest citation rates within the platform. YouTube also dominates multi-platform citations: 26 of the 100 highest-platform-diversity pages in our dataset are YouTube videos.
The mechanism is straightforward. Video content is transcribed, indexed, and treated as authoritative user-generated review content by AI retrieval systems. A single detailed product review video generates citations across multiple query types: "best X," "X vs Y," "X review," and "how to choose X."
Pages with 19+ data points average 5.4 AI citations versus 2.8 without, a 93% increase per SE Ranking's study of 129,000 domains.
Adding specific statistics to content is the highest-impact on-site optimization tested in controlled experiments. The KDD 2024 paper found a 30-40% relative visibility improvement from statistics addition across 10,000 queries. SE Ranking's observational study of 129,000 domains found that pages with 19 or more data points averaged 5.4 citations, compared to 2.8 for pages with fewer. That is a 93% increase.
Wan et al. (ACL 2024, UC Berkeley) confirmed the mechanism. LLMs largely ignore stylistic authority signals like scientific tone or appeals to expertise. They respond to textual relevance and factual density. A page with specific numbers and named sources provides higher information density per token, which makes it more useful as a retrieval result.
| Metric | Published Research | Sill Audit Data |
|---|---|---|
| Pages with statistics | Avg 5.4 citations (SE Ranking) | 80.2% of audited pages have some stats |
| 19+ data points target | Threshold for top-tier citation | Only 3.9% of pages meet this target |
| Visibility boost | +30-40% (KDD 2024 RCT) | Mean audit score: 30.4/100 |
The gap between awareness and implementation is striking. While 80.2% of the pages we audited include some statistical content, only 3.9% reach the 19+ data point threshold associated with top-tier citation performance. Most pages include a few numbers. Few pages are genuinely data-dense. This is one of the largest untapped opportunities in GEO.
Answer capsules are the #1 predictor of ChatGPT citation: 87% of cited posts had capsules or proprietary data per Search Engine Land.
An answer capsule is a 120-150 character self-contained answer placed immediately after a question-framed H2 heading. Search Engine Land's analysis found this to be the number one predictor of ChatGPT citation: 87% of cited posts had either answer capsules or proprietary data.
The mechanism maps directly to how RAG (retrieval-augmented generation) systems work. When an AI engine retrieves a page, it needs to extract a concise answer fragment. Answer capsules are pre-extracted answers. They reduce the work the AI model needs to do to generate a useful response, which increases the probability of citation.
Our audit data shows a significant adoption gap: 0% of the pages we audited use structured answer capsules. This is the single largest gap between evidence strength and real-world implementation. Every on-site content optimization effort should start here.
Pages with expert quotes earn 70% more AI citations (4.1 vs 2.4) per SE Ranking; source citations add 30-40% visibility per the KDD 2024 paper.
The KDD 2024 paper found that citing authoritative sources delivered a 30-40% visibility improvement, comparable to statistics addition. SE Ranking's observational study found a 70% increase in citations for pages with expert quotes compared to those without (4.1 vs 2.4 citations per query).
Source citations serve a dual purpose. They increase the factual density that LLMs respond to (Wan et al., ACL 2024), and they create a citation chain that AI retrieval systems can verify. A page that cites a specific study is more useful as a retrieval result than a page that makes the same claim without attribution.
In our audits, 57.5% of pages include some form of expert quotes or source citations. The adoption rate is higher than answer capsules but still leaves substantial room for improvement, particularly in the depth and specificity of citations used.
Content updated within 90 days receives 67% more AI citations; pages not updated quarterly are 3x more likely to lose citations over time.
Content updated within 90 days receives 67% more AI citations than stale content (SE Ranking, Seer Interactive, AirOps). Pages not updated quarterly are 3x more likely to lose citations over time. Perplexity is the most freshness-sensitive platform: 50% of its citations come from content published in the current year (Profound, 680M citations). ChatGPT similarly skews recent: 60.5% of its top-cited pages were published within two years.
Only 15.4% of the pages we audited showed evidence of recent updates. This means 84.6% of brand content is losing citation potential through staleness alone. A quarterly content refresh schedule, even modest updates adding recent data or removing outdated claims, is one of the highest-ROI GEO maintenance activities.
Reddit holds 46.7% of Perplexity's top-10 citation share and appears in 40.1% of all LLM responses (Statista/Semrush).
Reddit holds 46.7% of Perplexity's top-10 citation share, making it the single most influential source for that platform (Profound). Across all LLM responses, Reddit appears in 40.1% (Statista/Semrush). Google has a $60M/year licensing deal with Reddit; OpenAI has a similar ~$70M/year agreement. Reddit content with 3+ upvotes enters Tier 2 training data sources.
The tactic requires genuine community participation. Promotional posts are flagged and removed. Two to three months of authentic engagement in relevant subreddits is the minimum investment before brand mentions carry weight. But the payoff is significant: Reddit is the dominant citation source for the AI platform (Perplexity) that skews most heavily toward purchase-intent queries.
Wikipedia holds 47.9% of ChatGPT's top-10 citation share; 78.8% of ChatGPT-mentioned tools have Wikipedia pages.
Wikipedia holds 47.9% of ChatGPT's top-10 citation share (Profound). Approximately 22% of LLM training data comes from Wikipedia (Quoleady). Among ChatGPT-mentioned tools, 78.8% have a Wikipedia page. Wikipedia is effectively a non-negotiable prerequisite for ChatGPT visibility.
Wikipedia notability requirements are strict: companies need significant coverage in independent, reliable sources. This creates a compounding effect with tactic #1 (branded web mentions) and tactic #11 (major publication coverage). The brands that earn enough press coverage to qualify for a Wikipedia page are the same brands that accumulate the off-site mentions that drive AI visibility across all platforms. Wikipedia is both a direct citation source and an indicator of the broader earned media presence that AI engines reward.
HTML comparison tables increase AI citation rates by 47% with 96% extraction accuracy; only 1% of audited pages include them.
Properly structured HTML comparison tables increase AI citation rates by 47% (Search Engine Land). AI extraction accuracy from well-formatted tables reaches 96% (Am I Cited). The specialized review sites that dominate our citation data (rtings.com at 1.6 citations/page, tomshardware.com at 1.9) are built around structured product comparisons.
Only 1% of the pages we audited include structured comparison content. The requirements are specific: proper HTML tables with thead elements, descriptive column headers, and consistent data formatting. Tables embedded as images or built with CSS grids are not extracted at the same rate. This is a structural implementation detail that has measurable impact.
Brands on review platforms are 3x more likely to be cited by ChatGPT; GetApp captures 47.6% of B2B software citations.
Brands listed on review platforms (G2, Capterra, GetApp, Clutch) are 3x more likely to be cited by ChatGPT (SE Ranking). The dominance is vertical-specific: GetApp captures 47.6% of B2B software citations in ChatGPT (Hall, 456K citations). Clutch holds 72-84.5% for agencies. TripAdvisor holds 72.9-94.2% for travel.
An important nuance: not all review platforms are indexed equally by AI. Yelp and Trustpilot block AI crawlers. GetApp, Clutch, and SourceForge allow full access. 100% of ChatGPT-mentioned tools have Capterra reviews and 99% have G2 reviews (Quoleady). For B2B brands, review platform presence is table stakes.
Brands with 5+ mentions in major publications within 6 months achieve 4.2x higher sustained citation rates; editorial media is 61% of AI reputation responses.
Editorial media accounts for 61% of AI reputation responses (We Are Bottle). That rises to 72% for value perception queries, the kind buyers ask when evaluating vendors. Brands with 5 or more mentions in major publications within a 6-month window achieve 4.2x higher sustained citation rates compared to those with fewer (Profound).
The threshold matters. Sporadic press coverage produces minimal effect. Consistent coverage over time, hitting the 5-mention threshold, creates a compounding citation advantage. This aligns with Chen et al.'s finding that AI engines exhibit systematic bias toward earned media. The brands that invest in sustained PR rather than one-off announcements are the ones AI engines learn to recommend.
Pages with 2,900+ words average 5.1 citations versus 3.2 for shorter pages; optimal section length is 120-180 words per heading.
Pages with 2,900 or more words average 5.1 citations compared to 3.2 for shorter pages, a 59% increase (SE Ranking, 129K domains). Optimal section length is 120-180 words per heading, which produces 4.6 vs 2.7 citations (a 70% increase).
Only 31.8% of the pages we audited meet the 2,900+ word count target. Only 15.7% structure their content with the 120-180 word sections that maximize per-heading citation probability. Long-form content works because it increases the number of unique answer fragments a page can provide. Each well-structured section is a potential retrieval result for a different query. Length alone is not the goal. Structured depth with specific, extractable sections is the goal.
Mean content audit score is 30.4/100 with 0% adoption of answer capsules and schema markup, the two highest-evidence on-site tactics.
We scored pages across hundreds of brands against these 12 tactics. The mean content audit score was 30.4 out of 100 (median 29.2). Zero pages scored above 80. The distribution is concentrated between 20-40, with 49.2% of all audited pages falling in that range.
The gap between what the research shows works and what brands actually implement is the largest competitive opportunity in GEO.
| Tactic | Adoption Rate | Gap Assessment |
|---|---|---|
| Statistics (any) | 80.2% | High adoption, but only 3.9% meet the 19+ data point threshold |
| Author bios / E-E-A-T signals | 79.3% | Well adopted, but credential depth varies |
| Expert quotes and citations | 57.5% | Moderate adoption, specificity is often lacking |
| Long-form (2,900+ words) | 31.8% | Two-thirds of pages are too thin |
| 120-180 word sections | 15.7% | Most pages have irregular section lengths |
| Fresh content (updated recently) | 15.4% | 84.6% of content is stale |
| Comparison content | 1.0% | Near-zero adoption despite +47% citation rate |
| Answer capsules | 0% | Zero adoption despite being #1 citation predictor |
| Schema markup | 0% | Zero adoption despite 81% presence on cited pages |
The three tactics with the widest adoption gap, answer capsules (0%), schema markup (0%), and comparison tables (1%), are all on-site structural changes that a content team can implement without waiting for external coverage. They are the fastest path from current state to measurably better AI citation performance.
Keyword stuffing, FAQ schema, and llms.txt files are confirmed ineffective; domain authority shows slightly negative correlation with AI citations.
An evidence-based ranking would be incomplete without the negative results. These tactics have been tested and found to be ineffective or actively harmful. Otterly.AI's March 2026 controlled experiments confirmed the llms.txt finding independently: zero measurable impact on AI traffic.
| Tactic | Measured Effect | Source |
|---|---|---|
| Keyword stuffing | 10% worse than baseline | Aggarwal et al., KDD 2024 |
| FAQ schema markup | 3.6 vs 4.2 citations (negative) | SE Ranking, 129K domains |
| llms.txt files | Zero measurable impact | SE Ranking, 300K domains |
| Domain authority alone | r = -0.12 to -0.18 (negative) | SearchAtlas, 21,767 domains |
| Backlinks (direct effect) | r = 0.218 (weak) | Ahrefs, 75,000 brands |
The keyword stuffing result is particularly important because it is the most intuitive tactic for teams trained in traditional SEO. The exact search query appears in only approximately 5% of AI answers. AI engines reformulate queries, retrieve by semantic similarity, and generate novel responses. Optimizing for exact-match keywords is counterproductive.
The FAQ schema result may be the most counterintuitive. Q&A format content helps (it is a Tier 2 tactic). But FAQ schema markup specifically hurts citation rates. The likely explanation is that FAQ schema triggers featured snippet treatment in traditional search, which can reduce the content available for AI extraction. Content structure and markup structure are separate levers, and they do not always pull in the same direction.
Only 11% of domains are cited by both ChatGPT and Perplexity; each platform favors different content types and citation sources.
The 12 tactics above apply universally, but their relative weight shifts by platform. Only 11% of domains are cited by both ChatGPT and Perplexity (Profound). Each platform has distinct citation DNA. Wu et al. (CMU, 2025) found that engine-specific optimization rules consistently outperform generic strategies. We cover the channel-by-channel implementation playbook in our companion post.
| Platform | Top Citation Sources | Priority Tactics |
|---|---|---|
| ChatGPT | Wikipedia (47.9%), Forbes, G2 | Wikipedia, review platforms, branded mentions |
| Perplexity | Reddit (46.7%), YouTube, Gartner | Reddit presence, content freshness, YouTube |
| Google AI Overviews | Reddit (21%), YouTube (18.8%), Quora | Organic rankings, YouTube, answer capsules |
| Gemini | Authoritative lists (49%), Google authority | List inclusions, local reviews, structured data |
| Claude | Databases/directories (68%), awards | Directory presence, awards, longevity signals |
The practical implication: if your audience primarily uses Perplexity, investing in Reddit presence and content freshness will produce faster results than optimizing Wikipedia. If ChatGPT is the dominant platform in your vertical, Wikipedia and review platform listings are non-negotiable. Monitoring platform-specific visibility is the prerequisite for platform-specific optimization.
Start with on-site changes deployable in weeks: answer capsules (0% adoption), schema markup (0%), and comparison tables (1%) have the widest evidence-to-adoption gaps.
The 12 tactics fall into two categories by implementation timeline. On-site tactics (3, 4, 5, 6, 9, 12) can be implemented within weeks. Off-site tactics (1, 2, 7, 8, 10, 11) require months of sustained effort. Otterly.AI's March 2026 controlled experiments confirmed this split: the two "highly effective" results were listicle inclusion (off-site) and footer text optimization (on-site), with on-site changes being the faster path. For realistic expectations on how much SOV movement each tactic type produces, see our companion post on GEO tactics in practice.
For on-site, start with the widest adoption gaps: add answer capsules (0% adoption), implement schema markup (0% adoption), and add comparison tables (1% adoption). These three changes have strong evidence backing and near-zero current implementation. They represent the highest-impact, lowest-competition optimizations available.
For off-site, prioritize the platform where your audience is. Check your AI visibility across platforms to identify where you are weakest. If you have zero Reddit presence and your buyers use Perplexity, that is your highest-ROI investment. If you lack review platform listings and your category is dominated by B2B queries, G2 and Capterra listings should come first. The data tells you where to start. The rankings tell you what to do when you get there.
Sill audits your content against these 12 tactics, scores each page, and prioritizes the changes with the highest evidence-backed impact on AI visibility.
Tell us about your brand and we'll be in touch to walk you through Sill.