Skip to main content
Back to Blog
Research

Beyond SparkToro: What Actually Predicts LLM Visibility

In January 2026, Rand Fishkin and Patrick O'Donnell ran 2,961 prompts across ChatGPT, Claude, and Google AI Overviews using hundreds of volunteers. They asked the same questions repeatedly and measured how often the same brands appeared in the same order. The probability: less than one in a hundred. For ordering specifically, closer to one in a thousand. Fishkin's conclusion was blunt: "any tool that gives a 'ranking position in AI' is full of baloney." He is right about rank. He is also only telling half the story. The same research that demolished rank-based measurement validated something else entirely: frequency of appearance is statistically consistent and measurable. That frequency signal is what the industry should have been building around all along; it maps to a channel converting at 3-11x traditional search rates, growing 155% year over year, with 80% of its traffic invisible to standard analytics.

TL;DR

SparkToro's January 2026 research proved that AI brand recommendation lists repeat less than 1% of the time, but frequency of brand appearance IS consistent and statistically measurable. This maps to AI Share of Voice: the right metric for LLM visibility. Microsoft Clarity, Adobe, and Search Engine Land independently confirm AI-referred traffic converts at 3-11x traditional search rates. SparkToro's own Q4 2025 data shows Google desktop searches fell 20% YoY in the US while AI tool usage tripled. AirOps found 85% of AI brand discovery comes from third-party content. Loamly's 2026 benchmark reveals GA4 misses over 80% of AI traffic. The 0.13-1% AI traffic figures in most analytics dashboards are the visible fraction of a channel that is growing, converting, and largely invisible to standard measurement.

Abstract visualization representing the statistical signal within the noise of AI brand recommendations, with frequency patterns emerging from chaotic individual data points

What SparkToro Actually Tested

SparkToro ran 2,961 prompts across ChatGPT, Claude, and Google AI Overviews; the same brand list repeated less than 1 in 100 times.

The study used hundreds of volunteers running identical prompts in November and December 2025. The methodology matters: these were real users on real accounts with personal browsing histories and geographic variation, not sterile API calls. SparkToro measured both the composition of brand lists (which brands appeared) and the ordering (the sequence in which they appeared). Composition repeatability was under 1%. Ordering repeatability was closer to 0.1%.

AirOps' 2026 State of AI Search report reached a convergent finding through different methodology: only 30% of brands stay visible from one AI answer to the next, and just 20% remain present across five consecutive runs of the same prompt. Brands earning both citations and mentions were 40% more likely to resurface across multiple runs than citation-only brands.

The instability is real and measurable. SparkToro framed it as a warning against rank-based tools. The more consequential insight is what remained stable when individual responses did not.

Rank Is Noise. Frequency Is the Signal.

AI brand rank changes with nearly every query, but aggregate frequency of appearance is consistent and statistically measurable across repeated runs.

Fishkin's own data supports this distinction. While the specific list of brands changed between runs, the brands that appeared most frequently were consistent across the full sample. A brand showing up in 70% of runs for a given category held roughly that rate across measurement windows. The variance was in composition and order; the statistical distribution of appearances was stable.

This is the same principle behind media measurement in every other channel. Television advertisers do not track their exact position in an ad break. PR professionals do not measure their rank in a news cycle. Both track share of voice: the proportion of total category mentions their brand captures over a sustained period. That aggregate frequency metric smooths the noise that makes any individual instance unreliable. PR's measurement evolution from clipping services to layered evidence frameworks took 15 years. LLM visibility is compressing that timeline.

Sill's own data across 7,442 AI responses, 139 brands, and four platforms confirms this pattern. The median AI Share of Voice is 15 out of 100. The metric is not volatile at the aggregate level; individual responses are. The distinction matters: 55% of brands have a 10+ point SOV spread between their best and worst platform, but those spreads are stable over multi-week measurement windows. What SparkToro proved about instability at the instance level reinforces the case for frequency-based measurement at the aggregate level.

The Conversion Evidence Is No Longer Circumstantial

AI-referred traffic converts at 3-11x the rate of traditional search: sign-up conversion 1.66% vs. 0.15% for search across 1,200+ sites (Microsoft Clarity, 2025).

Three independent studies published between November 2025 and February 2026 converge on the same finding from different methodologies, sample sizes, and verticals. Microsoft Clarity analyzed over 1,200 publisher sites and found LLM traffic grew 155.6% over eight months, with sign-up conversion at 1.66% versus 0.15% for search traffic: an 11x difference. Subscription conversion followed the same pattern at 1.34% for LLM versus 0.55% for search.

Adobe's analysis of one trillion retail visits during the 2025 holiday season measured AI referral traffic surging 693% year over year. AI-referred shoppers converted 31% higher than other traffic sources, with revenue per visit up 254% and bounce rates 33% lower. These are not users casually browsing; they arrive with intent shaped by the AI recommendation.

SourceAI ConversionSearch BaselineMultiplier
Microsoft Clarity (sign-up)1.66%0.15%11x
Microsoft Clarity (subscription)1.34%0.55%2.4x
Search Engine Land (13 months)~18%Paid, SEO, PPCHighest source
Adobe (holiday retail)+31% vs. other sourcesAll other referrals693% YoY growth

Search Engine Land's 13-month analysis from January 2025 through February 2026 measured LLM referral conversion at approximately 18%: higher than paid shopping, SEO, or PPC. Per-platform conversion data shows the advantage is not uniform: Claude leads at 16.8%, ChatGPT at 14.2%, Perplexity at 12.4%. The platform variation is another reason single-platform LLM visibility measurement produces structurally misleading conclusions.

80% of AI Traffic Is Invisible in GA4

GA4 misses over 80% of AI-referred traffic because mobile apps strip referrer headers and users copy-paste URLs into browsers (Loamly, 2026).

Loamly's 2026 State of AI Traffic benchmark quantified what SparkToro's dark traffic research had theorized: standard analytics tools miss the vast majority of AI-referred visits. The mechanism is straightforward. Mobile AI apps do not pass referrer headers. Users who receive a recommendation in ChatGPT or Claude frequently copy the URL and paste it into a browser, stripping the referral chain entirely. The visit registers as direct traffic in GA4.

The 0.13-1% AI traffic figures that appear in most analytics dashboards represent only the visits with intact attribution. The actual volume is five to ten times higher. Previsible's State of AI Discovery report, analyzing 1.96 million LLM sessions, found AI traffic concentrates on high-intent pages: industry pages at 1.14% visible share, tool pages at 0.95%, pricing pages at 0.46%. These are not casual browsing sessions; they are visitors arriving with a specific recommendation and comparing options.

The dark traffic problem compounds every other measurement challenge. A marketing team reviewing GA4 sees AI referral traffic as a rounding error. The same team's branded search trend, product demo requests, and direct traffic spikes may all contain AI-influenced visits that are structurally invisible to standard attribution. This is why the attribution gap is a measurement infrastructure problem, not a channel performance problem. The channel is performing; the analytics stack cannot see it.

The Volume Shift SparkToro's Own Data Confirms

Google desktop searches per US user fell 20% year over year in Q4 2025 while AI tool usage nearly tripled (SparkToro/Datos, Q4 2025).

SparkToro's partnership with Datos produced the Q4 2025 State of Search report, and the findings sharpen the urgency beyond what the brand recommendation study alone conveyed. Google desktop searches per US user declined nearly 20% year over year. AI tools nearly tripled their usage share over the same period. Google still accounts for 73.7% of desktop searches across the 41 domains analyzed, but the trajectory is directional and accelerating.

Pew Research confirms the behavioral shift from the consumer side: 39% of US adults now use ChatGPT weekly for decision-making tasks. Similarweb's global tracker shows Gemini growing 643% year over year while ChatGPT has reached 883 million monthly users generating two billion queries daily. Conductor's November 2025 benchmarks found AI Overviews now appear in 25.11% of Google searches, up from 13.14% in March 2025; 87.4% of identifiable AI referral traffic comes from ChatGPT alone.

MetricFindingSource
Google desktop searches (US)-20% YoY per userSparkToro/Datos Q4 2025
AI tool usage shareNearly tripled YoYSparkToro/Datos Q4 2025
Weekly ChatGPT usage (US adults)39%Pew Research, March 2026
Gemini traffic growth+643% YoYSimilarweb, Feb 2026
AI Overviews in Google searches25.11% (up from 13.14%)Conductor, Nov 2025
ChatGPT monthly users883MSimilarweb, Feb 2026

The volume is there. The growth is there. The conversion advantage is documented across independent studies. The measurement infrastructure is what has not kept pace. Fishkin himself noted on the Near Media podcast in February 2026 that while AI responses are inconsistent at the individual level, aggregate brand presence can be measured statistically. The qualifier matters: it is the difference between a broken metric and a misapplied one.

85% of Discovery Happens on Content You Do Not Control

85% of AI brand discovery comes from third-party content: listicles, comparison guides, community discussions, and niche publisher reviews (AirOps, 2026).

AirOps' 2026 report found that the overwhelming majority of AI brand visibility originates from content the brand did not create. Listicles, comparison guides, community discussions on Reddit and Quora, and niche publisher reviews drive 85% of brand discovery in AI search. This finding reframes LLM visibility strategy: optimizing your own site is necessary but nowhere near sufficient.

BrightEdge's citation analysis reinforces this from a different angle: only 17% of sources cited in AI Overviews also rank in the organic top ten. Five out of six AI citations pull from pages not on the first page of traditional search results. SE Ranking's study of 2.3 million pages found that domains with Quora and Reddit mentions have approximately four times higher chances of being cited by AI platforms. Citation volume can differ by 615x between platforms for the same domain, with Grok and Claude at opposite ends of the range.

AirOps also found that pages not updated quarterly are three times more likely to lose their AI citations, adding a temporal dimension that static SEO content does not face. Sequential headings and rich schema correlate with 2.8x higher citation rates. The brands with the highest LLM visibility have not just optimized their own properties; they have built a presence across the third-party ecosystem that AI models draw from. That ecosystem is dynamic, platform-specific, and requires continuous measurement to track.

What LLM Visibility Measurement Should Look Like Now

Effective LLM visibility measurement requires frequency-based SOV across all major platforms, branded search correlation, and content change attribution with within-brand controls.

SparkToro established what not to measure: point-in-time rank on any single platform. The research published in the three months since points toward what to measure instead. Frequency-based Share of Voice, tracked daily across every major AI platform, captures the stable signal within the noise that Fishkin's own data validated. Branded search trend in Google Search Console provides a downstream proxy for AI influence on discovery behavior. Content change attribution, using a within-brand control group of queries unaffected by the change, isolates whether specific interventions moved the needle or whether the shift was platform noise.

This is the same multi-layered evidence structure that PR, television, and podcast advertising have used for decades. None of those channels have perfect attribution. All of them justify billions in annual spend through frameworks that are rigorous without being complete. The four-metric framework for AI search ROI operationalizes this approach for marketing teams facing Q2 budget reviews.

MetricWhat It CapturesMinimum Window
AI Share of Voice (frequency)Stable brand presence across platforms4 weeks baseline
Branded search trend (GSC)Downstream discovery influence8-12 weeks
GA4 AI referral conversionVisible portion of AI traffic qualityOngoing (29.4% visible)
Content change attributionCausal intervention effect4+ weeks post-change

The brands that assemble this evidence infrastructure first will have the measurement foundation their competitors are still deferring. Forrester projects 25% of planned AI search spend pushed to 2027 for lack of proven ROI. That deferral is not a channel judgment; it is a measurement gap. The data to close it exists today. The question is whether the tools and frameworks are in place to collect it.

Measure What SparkToro Proved Matters: Frequency, Not Rank

Sill tracks AI Share of Voice daily across ChatGPT, Gemini, Perplexity, Claude, Grok, and Google AI Overviews. Frequency-based measurement across all six platforms, starting at $0/month.

References

  1. Fishkin, R. and O'Donnell, P. "New Research: AIs Are Highly Inconsistent When Recommending Brands or Products." SparkToro Blog, January 2026. sparktoro.com
  2. AirOps. "The 2026 State of AI Search." AirOps Report, 2026. airops.com
  3. Microsoft Clarity. "AI Traffic Converts at 3x the Rate of Other Channels." Microsoft Clarity Blog, November 2025. clarity.microsoft.com
  4. Adobe. "AI-Driven Traffic Surges Across Industries." Adobe Business Blog, January 2026. business.adobe.com
  5. Search Engine Land. "What 13 Months of Data Reveals About LLM Traffic Growth and Conversions." Search Engine Land, February 2026. searchengineland.com
  6. Loamly. "State of AI Traffic 2026 Benchmark Report." Loamly Blog, 2026. loamly.ai
  7. SparkToro and Datos. "State of Search Q4 2025." Datos Report, Q4 2025. datos.live
  8. Pew Research Center. "Key Findings About How Americans View Artificial Intelligence." Pew Research, March 2026. pewresearch.org
  9. Similarweb. "Gen AI Stats 2026." Similarweb Marketing Blog, February 2026. similarweb.com
  10. Conductor. "AEO/GEO Benchmarks Report." Conductor Academy, November 2025. conductor.com
  11. BrightEdge. "AI Search Visits Surging 2025." BrightEdge Research, September 2025. brightedge.com
  12. SE Ranking. "ChatGPT vs Perplexity vs Google vs Bing: Comparison Research." SE Ranking Blog, 2025. seranking.com
  13. Previsible. "AI SEO Study 2025: State of AI Discovery." Previsible, 2025. previsible.io
  14. Near Media. "EP 244: AI Visibility vs Google Rankings." Near Media Podcast, February 2026. nearmedia.co
  15. Am I Cited. "Conversion Rate of AI Traffic." Am I Cited FAQ, 2026. amicited.com

Get Your Report

Request your first analysis today to see where you stand.

Daniel Wang

Founder · UC Berkeley MIDS

Previously at Nordstrom, Bloomberg, Hexagon (now Octave)

Related reading