Spotlight's February 2026 benchmark across 2.4 million AI responses found that only 30% of brands persist from one answer to the next, and that earning a citation alongside a mention makes a brand 40% more likely to reappear. We tested both claims against Sill's daily monitoring data: 2,759 day-pairs across 10 brands, 6 AI platforms, and 36 days of consecutive tracking. We trained XGBoost and logistic regression models to identify which features of a day N response predict whether the brand appears again on day N+1. The overall persistence rate in our data matched Spotlight's almost exactly. The citation effect showed a different pattern in our daily data, though with a smaller sample this is a preliminary observation worth investigating further.
TL;DR
We trained XGBoost and logistic regression models on 2,759 day-pairs from Sill's daily monitoring data across 10 brands and 6 AI platforms to predict next-day brand persistence. The overall persistence rate is 29%, matching Spotlight's 30% benchmark almost exactly. Among brands that appeared on day N, 82.6% persist to day N+1; absent brands break through only 6% of the time. Position is a steep gradient: primary brands persist at 93%, secondary at 77%, mentioned at 54%. Platform divergence is extreme: Google AI Overviews at 37% versus Grok at 0.6%. Rolling 7-day SOV is the dominant predictor (8.3x odds); removing temporal features, excerpt length becomes #1, followed by competitor count and SOV score. Spotlight found citation presence increases persistence by 40%; Sill's daily data shows a different pattern (mention + citation at 81.5% vs. mention-only at 87.0%), though the smaller sample size means this is a preliminary observation. Shopping card presence does increase persistence (46% vs 27%). Momentum compounds: brands on 8-14 day streaks persist at 94%, and 15+ day streaks persist at 100%.

XGBoost and logistic regression trained on 2,759 day-pairs across 10 brands and 6 AI platforms to predict next-day brand persistence.
Sill monitors six AI platforms daily through their actual chat interfaces: ChatGPT, Gemini, Google AI Overviews, Perplexity, Copilot, and Grok. For each brand, the same queries run every day against every platform, producing a time series of visibility observations. We extracted 2,759 consecutive day-pairs from this data (day N features paired with day N+1 outcomes) spanning February 28 to April 5, 2026.
For each observation, we extracted 15 features from the response: the brand's position (primary, secondary, mentioned, or absent), sentiment, sentiment confidence, SOV score, excerpt length, number of source citations, shopping card presence, competitor count, competitive SOV gap, and platform identity. We derived two temporal features: streak length (consecutive days the brand appeared) and rolling 7-day SOV.
We trained two model variants: a full model with temporal features, and a model without temporal features to isolate what matters beyond momentum. Both used a temporal train/test split (75/25 by date) to prevent data leakage. The target variable: did the brand appear for the same query and platform on the next day?
Overall persistence rate is 29%, matching Spotlight's 30% benchmark; among brands that appeared on day N, 82.6% persist to day N+1.
Across all 2,759 day-pairs, 29.0% resulted in the brand appearing on the following day. This is within one percentage point of the Spotlight February 2026 benchmark (30% persistence across 2.4 million responses), validating their finding with an independent dataset at daily granularity.
The low headline number is driven by the 1,871 observations where the brand was absent on day N; those break through only 6% of the time. Among brands that did appear on day N, 82.6% persisted to day N+1. The persistence question is really two questions: how hard is it to get in (very hard; 6% breakthrough rate), and how hard is it to stay in once you are there (not very; 82.6% retention). The first appearance is the bottleneck.
Primary-position brands persist at 93% day-over-day; secondary at 77%; mentioned at 54%; absent brands break through only 6% of the time.
The position a brand holds in an AI response on day N is a strong predictor of whether it appears at all on day N+1. Primary-position brands (the first or most prominently recommended) persist at 92.5%. Secondary brands persist at 76.6%. Brands that received a passing mention persist at 53.7%. Brands mentioned negatively persist at 30.8%, meaning even a negative mention provides more persistence than absence. Absent brands break through at 6.0%.

The gradient from primary (93%) to absent (6%) is a 15x difference in persistence probability. In our data, brands in the primary position show high day-over-day stability; the secondary position still delivers strong retention but carries 16 percentage points more daily risk. If this pattern holds at larger scale, moving from secondary to primary would be worth more than the SOV difference alone suggests, because the stability benefit compounds over time.
Google AI Overview persistence is 37.2%; Grok is 0.6%. The logistic regression odds of persisting on Grok are 0.39x versus the baseline.
Platform identity is one of the strongest persistence predictors. Google AI Overviews show the highest day-over-day persistence at 37.2%, consistent with Sill's earlier finding that Gemini is the most stable platform at 28.5% daily top-pick change rate. Grok and Copilot are at the other extreme: Grok persists at 0.6% and Copilot at 3.1%, meaning brands recommended today are almost certainly absent tomorrow.
| Platform | Persistence Rate | Odds Ratio | Observations |
|---|---|---|---|
| Google AI Overview | 37.2% | 1.77x | 584 |
| Gemini | 34.3% | 1.57x | 607 |
| ChatGPT | 30.7% | baseline | 662 |
| Perplexity | 26.8% | 1.52x | 622 |
| Copilot | 3.1% | 0.57x | 130 |
| Grok | 0.6% | 0.39x | 154 |
This extends the platform divergence Sill has documented across 139 brands: 55% of brands have a 10+ point SOV spread between their best and worst platform. The persistence data adds a temporal dimension to that divergence. A brand may score identically on two platforms on any given day, but the stability of that score varies by an order of magnitude depending on which platform you measure.
Rolling 7-day SOV is the #1 predictor with 8.3x odds; brands appearing 8-14 consecutive days persist at 94%, and 15+ day streaks persist at 100%.
The full XGBoost model (AUC 0.931, accuracy 91.5%) placed rolling 7-day SOV as the dominant feature, with a mean SHAP value 7x larger than the second-most-important feature. The logistic regression confirmed the magnitude: each standard deviation increase in rolling 7-day SOV multiplies the odds of next-day persistence by 8.3x.
The streak data shows this momentum concretely. Brands that appeared today after being absent yesterday persist at 69.7%; brands on a 1-to-3-day streak persist at 79.2%; 4-to-7-day streaks reach 90.5%; 8-to-14-day streaks hit 94.0%; and every brand with a 15+ day streak persisted to the following day. Visibility begets visibility. Each consecutive day of presence increases the probability of the next day's presence, suggesting that AI platforms exhibit a form of recommendation inertia.

Removing temporal features, excerpt length becomes the #1 predictor; competitor count and SOV score follow. The model achieves 0.882 AUC.
Stripping out rolling SOV and streak length forces the model to use only per-response features. The XGBoost model without temporal features still achieves 0.882 AUC and 88.7% accuracy, meaning the non-temporal signal is substantial. The logistic regression performs even better at 0.896 AUC.
Excerpt length is the clear #1 non-temporal predictor (SHAP 1.45, nearly 3x the second feature). This measures how much text the AI platform generated about the brand in its response. Longer excerpts indicate a richer internal representation of the brand for that query context; the model has more to draw on and is more likely to produce a similar response the next day. Brands receiving detailed, multi-sentence descriptions persist at far higher rates than those receiving a passing one-line mention.

Competitor count is the second most important feature and acts as a negative predictor: more competitors in the response means less stability for each individual brand. Responses crowded with alternatives shift more between days. SOV score follows at #3, with 1.84x odds per the logistic regression; higher SOV on day N predicts higher persistence. The SOV gap to the category leader is #4, measuring how far behind the dominant brand in that response.
Very positive sentiment carries 1.64x odds in the logistic regression, and shopping card presence carries 1.36x. Both are meaningful but subordinate to the structural features of response depth and competitive density.
In Sill's daily data, mention + citation persists at 81.5% vs. mention-only at 87.0%; Spotlight reported +40% in session-level data. The difference may reflect measurement granularity.
The Spotlight benchmark reported that brands earning both a mention and a citation were 40% more likely to reappear in the next answer. In our daily monitoring data, the pattern is different: brands with a mention and citation persist at 81.5%, while mention-only brands persist at 87.0%. The number of citations does carry a 1.50x odds ratio in the logistic regression, but the effect is weaker than position, excerpt length, or competitive dynamics. With only 10 brands in our sample, this observation is preliminary rather than conclusive.
The difference may reflect measurement granularity rather than a true contradiction. Spotlight measured persistence across consecutive answers within a single session (prompt-to-prompt); Sill measures persistence across consecutive calendar days. Citation presence could predict within-session consistency while playing a different role in day-over-day stability. A larger brand sample will help clarify whether this pattern holds more broadly.
Shopping card presence tells a different story. Brands appearing with shopping cards on day N persist at 46.0% versus 27.2% without, a 1.7x difference. Commerce integration may create a persistence advantage, though this too needs validation at larger scale.
The study covers 10 brands and 2,759 day-pairs; findings are directional and will strengthen as Sill's monitoring base expands beyond 50 brands.
This study has meaningful limitations. Ten brands across 2,759 day-pairs is sufficient for descriptive statistics and directional model findings, but insufficient for definitive generalization. The brand mix includes consumer electronics, SaaS, and niche product categories; industries with different competitive dynamics may show different persistence patterns. We excluded brand-level features (off-site authority scores, content quality scores) from the model because with only ten brands these features act as brand identity proxies rather than generalizable predictors.
As Sill's monitoring base grows beyond 50 active brands, we will re-run this analysis with brand-level features included, industry-level breakdowns, and cross-validated temporal folds. The descriptive findings here (position gradient, platform divergence, momentum compounding, citation non-effect) are robust at current sample sizes. The model coefficients should be treated as early signals that will sharpen with scale.
Sill monitors ChatGPT, Gemini, Perplexity, Google AI Overviews, Copilot, and Grok daily through their actual chat interfaces, tracking position, sentiment, citations, and shopping cards.
Request your first analysis today to see where you stand.