Research

GEO Tactics Experiment Results: What Controlled Tests Reveal About AI Visibility

The academic GEO evidence base is now stronger than any other emerging marketing channel. Ten papers. Fifteen industry studies. 680 million citations analyzed. The foundational KDD 2024 paper reports 30-40% relative visibility improvement from statistics addition. SE Ranking reports 93% more citations for pages with 19+ data points. Ahrefs reports a 0.664 correlation between branded mentions and AI visibility across 75,000 brands. All valuable; none of them answer the question a marketing director actually needs answered: "How many percentage points will my SOV move if I implement this?" Between the lab and the boardroom lies a gap that no correlation study can bridge.

This post synthesizes the published GEO tactics experiment results from Otterly.AI's March 2026 controlled tests, SE Ranking's 300K-domain study, CMU's AutoGEO research, and Sill's own data across 139 brands. For the broader category definition these tactics optimize for, see what is LLM visibility. It is the companion to our evidence-ranked GEO tactics list, which covers the 12 strongest tactics by measured effect size.

TL;DR

The first controlled GEO experiments (Otterly.AI, March 2026) found listicle inclusion and footer text repetition highly effective while llms.txt and author pages showed no measurable impact. Realistic SOV estimates: on-site moderate changes (statistics, answer capsules) produce 1-3pp; combined on-site and off-site overhauls produce 5-15pp; all estimates carry low-to-medium confidence because no controlled before-and-after studies with absolute pp effect sizes exist. The footer text discovery exploits how LLMs treat repeated content as confirmed fact. Two distinct time horizons operate: RAG retrieval reflects changes in days to weeks, training data updates take months to years. Only 11% of domains are cited by both ChatGPT and Perplexity; engine-specific optimization outperforms generic strategies (CMU AutoGEO, 35.99% improvement). Sill's experimentation platform addresses the measurement gap with hierarchical Bayesian estimation, 10x prompt sampling, and affected/comparison query controls to produce per-platform SOV effect sizes.

Film photograph representing the gap between GEO research findings and real-world practitioner results in AI visibility optimization — The gap between academic evidence and production results defines GEO in 2026.

GEO Is in a Pre-Measurement Era

No published GEO study reports absolute percentage-point SOV changes from controlled before-and-after experiments as of March 2026.

The foundational GEO paper (Aggarwal et al., KDD 2024) reported relative improvements: 30-40% visibility gains from statistics addition and citation. SE Ranking's study of 129,000 domains reported cross-sectional correlations: pages with 19+ data points averaged 5.4 citations versus 2.8 without. The Ahrefs study of 75,000 brands reported correlation coefficients: branded mentions at r = 0.664, YouTube at r = 0.737. All valuable. None of them answer the question a marketing director actually needs answered: "If I implement tactic X on my site, how many percentage points will my SOV move?"

Amos Weiskopf articulated the fundamental limitation: "There is no Search Console for LLMs. There is no index you can query. There is no crawl report." The systems are stochastic. SparkToro found less than a 1-in-100 chance that ChatGPT gives the same brand recommendations twice for the same prompt. Single observations are unreliable; rigorous measurement requires statistical sampling across many prompts run multiple times.

This does not mean the research is wrong. It means the field has strong directional evidence with weak precision. Knowing that statistics addition correlates with 93% more citations is useful. Knowing whether that translates to 1pp or 5pp of SOV movement for your specific brand in your specific category is the question nobody has yet answered at scale.

What Controlled Experiments Actually Show

Otterly.AI's March 2026 experiments found listicle inclusion and footer text repetition highly effective; llms.txt and author pages showed no measurable impact.

Otterly.AI ran the first published series of controlled GEO experiments in March 2026, testing specific tactics in isolation with before-and-after measurement. The results are qualitative rather than quantitative (no percentage-point numbers published), but the relative rankings are the closest thing the field has to experimental evidence.

Tactic	Type	Difficulty	Experimental Result
Adding brand to listicles/rankings	Off-page	Medium	Highly effective
Footer text with unique factual info	On-page	Very easy	Highly effective
YouTube long-form video	Social/UGC	Hard	Confirmed effective
Product/service directory listing	Off-page	Easy	Moderate impact
AI-written content (structured)	On-page	Medium	Outperformed human-written
Reddit thread replies	Social/UGC	Medium	Effectiveness varies by industry
Author pages and bios	On-page	Very easy	Limited impact
llms.txt implementation	Technical	Very easy	No measurable impact

Two findings stand out. Listicle inclusion (off-page) and footer text repetition (on-page) were the only tactics rated "highly effective." Both confirm patterns visible in the broader research: off-page mentions drive discovery, while structured on-site content drives extraction. The AI-written content result is notable as well; when optimized for structure and relevance, AI-generated content outperformed human-written content. Raw AI output did not. Structure is the variable, not authorship.

The llms.txt result deserves emphasis. SE Ranking had already found no correlation across 300,000 domains. Otterly's controlled experiment confirmed it: zero measurable impact on AI traffic. LLMs crawl these files, but crawling is not citing. Philipp Götza (Search Engine Land, January 2026) called this evidence a "ladder of misinference." Brands implementing llms.txt as a GEO strategy are investing in a tactic with two independent negative results and no positive ones.

The Footer Text Discovery

LLMs treat repeated footer text as confirmed site-wide fact, making footer statements a low-effort, high-yield GEO tactic confirmed by Otterly's experiments.

Otterly's most surprising finding is also the one with the clearest mechanism. Adding unique factual information to a site footer was rated "highly effective" with "very easy" implementation difficulty. The reason is architectural: footer content appears on every page of a site. When an LLM crawls or indexes multiple pages from the same domain, the footer text appears repeatedly across the corpus. LLMs interpret this repetition as confirmed, site-wide factual information.

A footer stating "Founded in 2019. Serving 4,200 customers across 38 countries" gets indexed not once but hundreds or thousands of times, once per crawled page. The model treats this as high-confidence factual data about the entity. The tactic exploits a genuine property of how transformer models aggregate information across training and retrieval contexts.

This is not well-covered in academic GEO literature; no published paper tests footer content specifically. It emerged from practitioner experimentation. The implication is straightforward: every brand should audit what factual claims their footer makes. A footer that says only "Copyright 2026" is a missed opportunity. A footer with specific, verifiable facts about the company gives AI engines extractable brand signals on every indexed page.

The On-Site vs Off-Site Paradox

Off-site brand mentions (r=0.664) are the strongest AI visibility predictor, but 87% of the GEO recommendations Sill generates for brands are on-site fixes.

The most important factors for AI visibility are overwhelmingly off-site. Brand web mentions show a correlation of 0.664 with AI citation (Ahrefs, 75,000 brands); that is three times stronger than backlinks at 0.218. AirOps found that 85% of brand mentions in AI answers come from third-party pages, not owned domains. Chen et al. (University of Toronto, 2025) documented that 69-82% of AI search citations are earned media, dwarfing brand-owned content. Pavel Israelsky (Search Engine Land, January 2026) stated it directly: "On-site optimization is the factor with least impact on the most important GEO KPI" for whether your brand appears at all.

And yet on-site changes are where teams should start. Across 748 GEO recommendations generated for 62 brands through Sill's recommendation engine, 87% were on-site structural fixes. The reason is not that on-site tactics have higher impact ceiling; they do not. The reason is controllability and speed. A content team can add answer capsules, restructure headings, and implement schema markup in a single sprint. Building branded mentions across YouTube, Reddit, Wikipedia, and press outlets takes 6-12 months of sustained effort.

Think of it as a candidate pool problem. Off-site factors determine whether your brand enters the pool of entities an AI engine considers for a given query. On-site factors determine whether your pages are selected from that pool when retrieved. Both layers matter, but they operate on different timescales. On-site optimization in weeks; off-site authority-building over quarters. A brand that invests only in on-site work will have well-structured pages that no AI engine retrieves. A brand that invests only in off-site mentions will get retrieved but lose the citation to a competitor with better content structure.

How Much SOV Movement Should You Expect?

On-site moderate changes estimate 1-3pp SOV improvement; combined on-site and off-site overhauls estimate 5-15pp. All estimates carry low confidence.

Nobody publishes reliable before-and-after SOV measurements. But by combining the academic research (relative improvements), observational cross-sectional data (correlations), and the few practitioner experiments available, we can synthesize directional estimates. These should be treated as planning ranges, not predictions. Every brand operates in a different competitive context; a tactic that moves SOV by 5pp in a niche with thin competition might produce less than 1pp in a saturated category.

Change Type	Est. SOV Impact	Confidence	Basis
Minor on-site edit (metadata, title, readability)	0-1pp	Low	SE Ranking FAQ data (+11% relative); Otterly: llms.txt = no impact
Moderate on-site (statistics, capsules, restructure)	1-3pp	Low	KDD 2024 30-40% relative; baseline-dependent
Major on-site overhaul (multi-page, full GEO)	2-5pp	Low	Combination effects (+5.5% over single tactic)
Footer text optimization	1-3pp	Low	Otterly: "highly effective"; no pp numbers published
Listicle/ranking inclusion (off-site)	3-8pp	Medium	Otterly: "highly effective"; 43.8% of ChatGPT page types are listicles
Press in notable outlets (5+ mentions in 6 months)	3-10pp	Medium	4.2x sustained citation rate; 61% of reputation responses from earned media
YouTube long-form presence	2-7pp	Medium	r = 0.737 (strongest factor); 31.8% of social citations
Combined on-site + multiple off-site	5-15pp	Medium	Radiant Elephant: 59pp (extreme case, empty niche)

The Radiant Elephant case study deserves specific context. Within 60 days of publishing their first data study, they appeared in 67% of AI responses on key topics versus 8% before, a 59pp increase. That number is real but unreproducible at scale. They filled a content vacuum in a low-competition niche. Established brands in saturated categories will see far smaller absolute movements. The 5-15pp range for combined efforts is a more realistic planning target for competitive markets.

We are transparent about a hard truth: GEO measurement is fundamentally more difficult than traditional SEO measurement. All estimates in this table carry "Low" or "Medium" confidence because no controlled before-and-after studies with absolute pp effect sizes have been published. Sill's data across 139 brands and 86 industries shows the scale of variance: 23% of brands score zero SOV across all platforms, while the median sits at 15 out of 100. A 3pp gain means something very different at a starting SOV of 5 versus 45.

Two Time Horizons: RAG Retrieval vs Training Data

RAG-based retrieval reflects content changes within days to weeks; training data updates operate on cycles spanning months to years.

GEO practitioners frequently conflate two distinct mechanisms through which content changes reach AI engines, and this confusion leads to misaligned expectations. The first mechanism is real-time retrieval (RAG). Perplexity, ChatGPT with search, and Google AI Overviews perform web searches for each query and retrieve fresh content. Content updated today can appear in Perplexity responses within days. Pages refreshed within 90 days are significantly more likely to appear in results across these platforms (SE Ranking: 67% more citations for recently updated content).

The second mechanism is training data incorporation. Base model knowledge comes from periodic training runs that happen on release cycles spanning months to years. SearchPilot noted that "the slow feedback loop and batch update nature of changes to models' learned information makes it largely impossible to run statistical tests" on this pathway. A Wikipedia edit or a new press mention may take months to influence a model's base knowledge layer.

The practical implication: on-site content optimizations (answer capsules, statistics, freshness) influence the RAG pathway quickly. Off-site reputation signals (brand mentions, press coverage, Wikipedia presence) influence both pathways but the training data pathway dominates their long-term effect. Teams should expect on-site changes to show impact in 2-6 weeks through RAG, while off-site authority-building requires 3-6 months for sustained visibility shifts.

Engine-Specific Implementation: A Channel Playbook

Only 11% of domains are cited by both ChatGPT and Perplexity; engine-specific optimization outperforms generic strategies per CMU's AutoGEO research.

Wu et al. (CMU, 2025) demonstrated with AutoGEO that engine-specific optimization rules consistently outperform generic strategies, achieving 35.99% average improvement with tailored approaches. Sill's own monitoring data confirms the fragmentation: 55% of brands show a 10+ point SOV spread between their best and worst performing platforms, and 91.6% of cited URLs appear on only one platform. A brand visible on ChatGPT may be invisible on Perplexity, and the tactics that fix one do not automatically fix the other.

Platform	Dominant Citation Sources	High-Priority Tactics	Implementation Notes
ChatGPT	Wikipedia (47.9%), Forbes, G2, TechRadar	Wikipedia page, review platforms, Bing optimization	Matches Bing top-10 results 87% of the time; 60.5% of cited pages from last 2 years
Perplexity	Reddit (46.7%), YouTube (13.9%), Gartner	Reddit participation, content freshness, YouTube	Real-time web search; 50% of citations from current-year content
Google AI Overviews	Reddit (21%), YouTube (18.8%), Quora (14.3%)	Organic rankings, YouTube, answer capsules	93.67% from top-10 organic; overlap grew from 32.3% to 54.5%
Claude	Databases/directories (68%), awards (19%)	Directory presence, Wikidata, longevity signals	Strongest big-brand bias; skews toward businesses 50+ years old
Gemini	Authoritative lists (49%), Google authority (23%)	List inclusions, Google Business Profile, local reviews	Local reviews dominate at 38% for local searches

YouTube deserves special emphasis. It is the single most impactful channel for AI visibility, holding the #1 cited domain position in Google AI Overviews at 29.5% (BrightEdge, September 2025) and showing the strongest correlation with AI visibility of any factor at r = 0.737 (Ahrefs). As of January 2026, YouTube content appears in 16% of all LLM answers versus Reddit's 10% (Profound, 680M citations). Long-form videos account for approximately 94% of YouTube citations; Shorts receive minimal citation.

Review platforms present an important tactical nuance. Yelp and Trustpilot block AI crawlers entirely, while GetApp, Clutch, and SourceForge allow full access. This explains why GetApp captures 47.6% of B2B software citations in ChatGPT despite being less well-known than Yelp or Trustpilot. For B2B brands, GetApp and Capterra listings are more valuable for AI visibility than Trustpilot reviews, regardless of review volume. The crawler access policy of the platform matters more than the platform's reputation with human buyers.

Why A/B Testing GEO Remains Structurally Difficult

SparkToro found less than a 1-in-100 chance ChatGPT gives the same brand recommendations twice; GPU batching causes cascading output differences.

Traditional SEO has SearchPilot. GEO has nothing comparable. The structural challenge is that LLM outputs are non-deterministic. Thinking Machines Lab traced this to GPU batching: batch size changes floating-point calculation order, causing cascading output differences even for identical prompts. SparkToro's January 2026 research confirmed the practical consequence: less than a 1-in-100 chance that ChatGPT produces the same brand recommendation list twice.

Five specific factors make controlled GEO experimentation difficult. First, stochastic outputs require large sample sizes to achieve statistical power. Second, content changes often coincide with other brand activity (PR campaigns, product launches, competitor moves) that confounds attribution. Third, platform fragmentation means effects may appear on one engine and not others. Fourth, no standard methodology exists. Fifth, most practitioner experiments test one brand on one set of prompts, producing sample sizes too small for reliable inference.

SearchPilot has developed a preliminary framework using control and variant page groups, but even they acknowledge the fundamental limitation: LLM outputs are probabilistic, and the measurement infrastructure the field needs does not yet exist in the open market. Citation patterns drift 40-60% month over month (Profound data), meaning baseline instability compounds the measurement challenge.

Closing the Evidence Gap

Sill's experimentation platform uses hierarchical Bayesian estimation with affected and comparison query controls to produce per-platform SOV effect sizes.

The measurement vacuum is the defining problem of GEO in 2026. Every observation-only dashboard can tell you your current SOV. None can tell you which change caused it to move. Sill's experimentation platform is designed specifically to close this gap. The approach uses three mechanisms. First, affected and comparison query groups: when a content change targets specific topics, we monitor both the affected queries and unrelated queries from the same brand as controls. Second, 10x sampling per prompt per platform: where most tools run each prompt once, we run it ten times to account for stochastic variance. Third, hierarchical Bayesian estimation: a statistical model that produces posterior distributions of treatment effects, stated with credible intervals rather than point estimates.

The goal is to produce the first rigorous per-platform pp effect estimates in production GEO. Combined with CMS content change detection, which timestamps on-site changes against the SOV timeline, the platform creates a closed loop: detect a change, measure the effect, feed the result back into the recommendation engine, and generate the next highest-impact suggestion based on what actually worked for that specific brand.

This matters because GEO is not a set of universal truths. A tactic that moves SOV for a B2B SaaS company may do nothing for a consumer electronics brand. Engine-specific, category-specific, and brand-specific effects are the norm. The only way to build reliable GEO knowledge is to run controlled experiments at scale and accumulate the evidence base the field currently lacks. Every brand that runs an experiment contributes to a growing body of production GEO evidence that benefits the entire ecosystem.

Five Principles for GEO Implementation

Effective GEO implementation follows five principles: start on-site, build off-site, optimize per-platform, measure rigorously, and refresh continuously.

The evidence, from both academic research and the first wave of practitioner experiments, converges on a clear implementation framework. First, start with on-site structural changes. Answer capsules, statistics density, schema markup, comparison tables, and footer text optimization can all be deployed within weeks, and the adoption gaps remain enormous (0% for answer capsules and schema, 1% for comparison tables per our content audit data). Second, invest in off-site authority systematically. Listicle inclusion and press coverage are the highest-impact off-site tactics confirmed by controlled experiments; they compound over 6-12 months.

Third, optimize per-platform, not generically. A brand with strong ChatGPT visibility but weak Perplexity presence needs Reddit participation and content freshness, not more Wikipedia optimization. Fourth, measure with appropriate rigor. Single-query spot checks are unreliable; statistical sampling across prompt panels with control groups is the minimum standard for actionable conclusions. Fifth, refresh continuously. Content updated within 90 days earns 67% more AI citations. The tactics that are safe for SEO and beneficial for GEO simultaneously are the ones that earn compounding returns over time.

The GEO landscape shifts monthly. Citation patterns drift 40-60% month over month. YouTube overtook Reddit as the top social citation source in January 2026. But the structural principles remain consistent: earn mentions, structure for extraction, publish original data, stay fresh, and measure what you do. The brands that build this discipline now will compound their advantage as the measurement infrastructure matures and the competitive landscape intensifies.

Stop guessing which GEO tactics work for your brand

Sill monitors your AI visibility across six platforms, scores your content against the evidence-ranked tactics, and runs controlled experiments to measure what actually moves your Share of Voice.

Start Monitoring Talk to Us

References

Aggarwal, P., et al. "GEO: Generative Engine Optimization." KDD 2024, Princeton/Georgia Tech/IIT Delhi. arxiv.org/abs/2311.09735
Wu, Y., et al. "AutoGEO: Automated Generative Engine Optimization (35.99% average improvement)." CMU, 2025.
Wan, Y., et al. "Evidence-based evaluation of LLM persuasion and factual density." ACL 2024, UC Berkeley. arxiv.org/abs/2407.13008
Otterly.AI. "GEO Controlled Experiments: Listicles, Footer Text, llms.txt, Author Pages." March 2026.
Ahrefs. "LLM Brand Visibility Study (75,000 brands, mention/backlink correlations)." ahrefs.com
SE Ranking. "AI Citation Analysis (129,000 domains, 2.3M pages)." November 2025.
Chen, Z., et al. "AI Search Engines and Earned Media Citations (69-82%)." University of Toronto, 2025.
Profound. "AI Citation Analysis (680M citations across platforms)." 2025-2026.
SparkToro. "AI Brand Recommendation Consistency (<1% repeat rate, 2,961 prompts)." January 2026.
Radiant Elephant. "Case Study: 8% to 67% AI Response Rate in 60 Days." 2025-2026.
SearchAtlas. "Domain Authority vs. LLM Visibility (21,767 domains)." searchatlas.com
BrightEdge. "AI Overviews Citation Sources (YouTube #1 at 29.5% share)." September 2025.
SearchPilot. "GEO A/B Testing Framework and Limitations." 2025.
Sill Internal Data. "139 brands, 86 industries, 748 GEO recommendations, content audit pipeline." March 2026.

Get a Demo

Tell us about your brand and we'll be in touch to walk you through Sill.

Back to Blog

Research

GEO Tactics Experiment Results: What Controlled Tests Reveal About AI Visibility

TL;DR

GEO Is in a Pre-Measurement Era

No published GEO study reports absolute percentage-point SOV changes from controlled before-and-after experiments as of March 2026.

What Controlled Experiments Actually Show

Otterly.AI's March 2026 experiments found listicle inclusion and footer text repetition highly effective; llms.txt and author pages showed no measurable impact.

Tactic	Type	Difficulty	Experimental Result
Adding brand to listicles/rankings	Off-page	Medium	Highly effective
Footer text with unique factual info	On-page	Very easy	Highly effective
YouTube long-form video	Social/UGC	Hard	Confirmed effective
Product/service directory listing	Off-page	Easy	Moderate impact
AI-written content (structured)	On-page	Medium	Outperformed human-written
Reddit thread replies	Social/UGC	Medium	Effectiveness varies by industry
Author pages and bios	On-page	Very easy	Limited impact
llms.txt implementation	Technical	Very easy	No measurable impact

The Footer Text Discovery

LLMs treat repeated footer text as confirmed site-wide fact, making footer statements a low-effort, high-yield GEO tactic confirmed by Otterly's experiments.

The On-Site vs Off-Site Paradox

Off-site brand mentions (r=0.664) are the strongest AI visibility predictor, but 87% of the GEO recommendations Sill generates for brands are on-site fixes.

How Much SOV Movement Should You Expect?

On-site moderate changes estimate 1-3pp SOV improvement; combined on-site and off-site overhauls estimate 5-15pp. All estimates carry low confidence.

Change Type	Est. SOV Impact	Confidence	Basis
Minor on-site edit (metadata, title, readability)	0-1pp	Low	SE Ranking FAQ data (+11% relative); Otterly: llms.txt = no impact
Moderate on-site (statistics, capsules, restructure)	1-3pp	Low	KDD 2024 30-40% relative; baseline-dependent
Major on-site overhaul (multi-page, full GEO)	2-5pp	Low	Combination effects (+5.5% over single tactic)
Footer text optimization	1-3pp	Low	Otterly: "highly effective"; no pp numbers published
Listicle/ranking inclusion (off-site)	3-8pp	Medium	Otterly: "highly effective"; 43.8% of ChatGPT page types are listicles
Press in notable outlets (5+ mentions in 6 months)	3-10pp	Medium	4.2x sustained citation rate; 61% of reputation responses from earned media
YouTube long-form presence	2-7pp	Medium	r = 0.737 (strongest factor); 31.8% of social citations
Combined on-site + multiple off-site	5-15pp	Medium	Radiant Elephant: 59pp (extreme case, empty niche)

Two Time Horizons: RAG Retrieval vs Training Data

RAG-based retrieval reflects content changes within days to weeks; training data updates operate on cycles spanning months to years.

Engine-Specific Implementation: A Channel Playbook

Only 11% of domains are cited by both ChatGPT and Perplexity; engine-specific optimization outperforms generic strategies per CMU's AutoGEO research.

Platform	Dominant Citation Sources	High-Priority Tactics	Implementation Notes
ChatGPT	Wikipedia (47.9%), Forbes, G2, TechRadar	Wikipedia page, review platforms, Bing optimization	Matches Bing top-10 results 87% of the time; 60.5% of cited pages from last 2 years
Perplexity	Reddit (46.7%), YouTube (13.9%), Gartner	Reddit participation, content freshness, YouTube	Real-time web search; 50% of citations from current-year content
Google AI Overviews	Reddit (21%), YouTube (18.8%), Quora (14.3%)	Organic rankings, YouTube, answer capsules	93.67% from top-10 organic; overlap grew from 32.3% to 54.5%
Claude	Databases/directories (68%), awards (19%)	Directory presence, Wikidata, longevity signals	Strongest big-brand bias; skews toward businesses 50+ years old
Gemini	Authoritative lists (49%), Google authority (23%)	List inclusions, Google Business Profile, local reviews	Local reviews dominate at 38% for local searches

Why A/B Testing GEO Remains Structurally Difficult

SparkToro found less than a 1-in-100 chance ChatGPT gives the same brand recommendations twice; GPU batching causes cascading output differences.

Closing the Evidence Gap

Sill's experimentation platform uses hierarchical Bayesian estimation with affected and comparison query controls to produce per-platform SOV effect sizes.

Five Principles for GEO Implementation

Effective GEO implementation follows five principles: start on-site, build off-site, optimize per-platform, measure rigorously, and refresh continuously.

Stop guessing which GEO tactics work for your brand

Sill monitors your AI visibility across six platforms, scores your content against the evidence-ranked tactics, and runs controlled experiments to measure what actually moves your Share of Voice.

Start Monitoring Talk to Us

References

Aggarwal, P., et al. "GEO: Generative Engine Optimization." KDD 2024, Princeton/Georgia Tech/IIT Delhi. arxiv.org/abs/2311.09735
Wu, Y., et al. "AutoGEO: Automated Generative Engine Optimization (35.99% average improvement)." CMU, 2025.
Wan, Y., et al. "Evidence-based evaluation of LLM persuasion and factual density." ACL 2024, UC Berkeley. arxiv.org/abs/2407.13008
Otterly.AI. "GEO Controlled Experiments: Listicles, Footer Text, llms.txt, Author Pages." March 2026.
Ahrefs. "LLM Brand Visibility Study (75,000 brands, mention/backlink correlations)." ahrefs.com
SE Ranking. "AI Citation Analysis (129,000 domains, 2.3M pages)." November 2025.
Chen, Z., et al. "AI Search Engines and Earned Media Citations (69-82%)." University of Toronto, 2025.
Profound. "AI Citation Analysis (680M citations across platforms)." 2025-2026.
SparkToro. "AI Brand Recommendation Consistency (<1% repeat rate, 2,961 prompts)." January 2026.
Radiant Elephant. "Case Study: 8% to 67% AI Response Rate in 60 Days." 2025-2026.
SearchAtlas. "Domain Authority vs. LLM Visibility (21,767 domains)." searchatlas.com
BrightEdge. "AI Overviews Citation Sources (YouTube #1 at 29.5% share)." September 2025.
SearchPilot. "GEO A/B Testing Framework and Limitations." 2025.
Sill Internal Data. "139 brands, 86 industries, 748 GEO recommendations, content audit pipeline." March 2026.

Get a Demo

Tell us about your brand and we'll be in touch to walk you through Sill.

GEO Tactics Experiment Results: What Controlled Tests Reveal About AI Visibility

GEO Is in a Pre-Measurement Era

What Controlled Experiments Actually Show

The Footer Text Discovery

The On-Site vs Off-Site Paradox

How Much SOV Movement Should You Expect?

Two Time Horizons: RAG Retrieval vs Training Data

Engine-Specific Implementation: A Channel Playbook

Why A/B Testing GEO Remains Structurally Difficult

Closing the Evidence Gap

Five Principles for GEO Implementation

Stop guessing which GEO tactics work for your brand

References

Get a Demo

What Is LLM Visibility? Definition, Measurement, and 2026 Benchmarks

Brand Sentiment Intelligence for Small Businesses: What AI Engines Actually Think About You

How AI Models Form Brand Opinions: The Content-to-Recommendation Pipeline

GEO Tactics Experiment Results: What Controlled Tests Reveal About AI Visibility

GEO Is in a Pre-Measurement Era

What Controlled Experiments Actually Show

The Footer Text Discovery

The On-Site vs Off-Site Paradox

How Much SOV Movement Should You Expect?

Two Time Horizons: RAG Retrieval vs Training Data

Engine-Specific Implementation: A Channel Playbook

Why A/B Testing GEO Remains Structurally Difficult

Closing the Evidence Gap

Five Principles for GEO Implementation

Stop guessing which GEO tactics work for your brand

References

Get a Demo