In February 2024, a Canadian tribunal ordered Air Canada to pay damages after its own AI chatbot fabricated a bereavement discount policy that did not exist. The company argued the chatbot was a separate legal entity; the tribunal disagreed. That same year, Google lost $100 billion in market cap after Bard hallucinated a verifiably false claim in a live product demo. Those were first-party AI systems the companies controlled. The more consequential version of this problem is happening in third-party AI platforms where brands have no editorial control at all. NP Digital tested 600 prompts across six major AI platforms in February 2026 and found that ChatGPT is fully correct about brands only 59.7% of the time; Grok manages 39.6%. The BBC independently tested AI engines on 100 news stories and found 51% of responses contained significant issues, with 13% of attributed quotes being altered or fabricated entirely. This is not an abstract research finding. Over 3 billion people interact with AI-generated answers monthly, and those answers are making factual claims about your company that you cannot review, edit, or correct.
TL;DR
NP Digital tested 600 prompts across six AI platforms: ChatGPT is fully correct about brands 59.7% of the time; Grok manages 39.6%. The BBC found 51% of AI answers contain significant issues, with 13% of attributed quotes fabricated. AI-cited content averages 1,064 days old (Ahrefs, 17M citations). BrightEdge found AI Overviews is 44% more likely than ChatGPT to surface negative brand sentiment, and engines flag different brands as negative 73% of the time. 85% of brand mentions come from third-party pages brands cannot control. Consumers trust AI at 3-4 out of 5 (Semrush, 1,030 respondents) while 50% have purchased after AI research. Structured data increases AI citation rates 3.1x; content updated within 90 days earns 67% more citations. Monitoring what AI platforms claim about your brand and verifying it against a fact profile is the defensive baseline.

NP Digital tested 600 prompts across 6 AI platforms: ChatGPT was fully correct 59.7% of the time; Grok scored 39.6%, with a 21.8% outright error rate.
NP Digital's AI Hallucinations and Accuracy Report (February 2026) tested 600 prompts across ChatGPT, Claude, Gemini, Perplexity, Copilot, and Grok. The results show a wide accuracy spread, with every platform producing a meaningful rate of outright incorrect responses. These are not edge cases triggered by adversarial prompts; they are the baseline error rate on standard brand-related queries.
| Platform | Fully Correct | Outright Incorrect |
|---|---|---|
| ChatGPT | 59.7% | 7.6% |
| Claude | 55.1% | 6.2% |
| Gemini | 51.3% | 8.0% |
| Perplexity | 49.3% | 12.2% |
| Copilot | 45.8% | 13.6% |
| Grok | 39.6% | 21.8% |
The gap between ChatGPT and Grok is 20 percentage points on full accuracy and 14 points on outright errors. Chad Gilbert, VP of Content at NP Digital, put it directly: "AI has become an incredible tool to accelerate efficiencies, but speed without accuracy creates real risk." The same study found that 47.1% of marketers encounter AI errors several times per week, and 36.5% admitted that hallucinated content has been published publicly. The platforms your buyers use to research your brand are wrong about it between one-third and one-half of the time.
AI errors about brands include fabricated quotes, reversed policies, discontinued products listed as current, and competitor features attributed to the wrong company.
The BBC and the European Broadcasting Union tested AI accuracy on 100 news stories across ChatGPT, Gemini, Copilot, and Perplexity. Fifty-one percent of all answers contained significant issues. Nineteen percent contained outright factual errors: wrong dates, wrong numbers, wrong attributions. But the most striking finding was that 13% of attributed quotes were altered or fabricated entirely. Gemini reversed NHS vaping guidance, stating the opposite of actual policy. ChatGPT and Copilot falsely claimed political figures were still in office after they had left. As BBC News CEO Deborah Turness said: "We live in troubled times, and how long will it be before an AI-distorted headline causes significant real-world harm?"
For brands, the error categories are specific and consequential. AI platforms claim companies offer services they discontinued years ago. They quote pricing from 2+ years ago as current. They attribute competitor innovations to the wrong company. They fabricate partnership claims and product capabilities. The NP Digital survey found that brand-unsafe content was the most common type of publicly published error (53.9%), followed by false or hallucinated information (43.5%).
Sill's own monitoring data across 139 brands and 86 industries shows that each AI platform characterizes brands differently, with 55% of brands showing a 10+ point SOV spread across platforms. The accuracy problem compounds the divergence problem: your brand is not only perceived differently on each platform, it is perceived incorrectly in different ways on each platform.
Ahrefs found AI-cited pages average 1,064 days old (2.9 years); Seer Interactive found 65% of AI bot traffic targets content published within the past year.
Accuracy errors are not random. Many trace directly to the age of the content that AI platforms draw on when constructing responses. Ahrefs analyzed 17 million citations across seven AI search platforms and found the average age of AI-cited pages is 1,064 days, approximately 2.9 years. This is actually fresher than traditional organic search results (which average 1,432 days), but the gap illustrates the problem: even when AI platforms actively retrieve fresh sources, they are citing content that predates your most recent product launch, pricing change, or leadership transition.
Seer Interactive's study of 5,000+ cited URLs provides the other half of the picture. While 65% of AI bot hits target content published within the past year, platform behavior diverges significantly: Perplexity draws 50% of its citations from 2025 content, ChatGPT draws only 31%, and AI Overviews sits at 44%. ChatGPT, the platform with the most weekly active users, is the least fresh in its source selection.
This creates a structural accuracy lag. A company that rebranded, changed its pricing model, or expanded into new categories will find AI platforms confidently describing the version of their brand that existed two to three years ago. SE Ranking's research, which we covered in our GEO tactics analysis, found that pages updated within 90 days earn 67% more AI citations, confirming that freshness is not only a ranking signal for AI but a direct accuracy lever for brands.
The Columbia Journalism Review found ChatGPT Search was incorrect in 67% of test queries; over 50% of Gemini and Grok citations linked to fabricated or broken URLs.
A natural assumption is that citations provide a reliability signal: if an AI platform links to a source, the information should be more trustworthy. The Columbia Journalism Review's Tow Center study tested 1,600 queries across eight AI search engines and found that assumption does not hold. ChatGPT Search was incorrect in 134 of 200 queries (67%). Grok produced 154 error-page citations out of 200 queries. Over 50% of responses from Gemini and Grok cited fabricated or broken URLs. ChatGPT signaled uncertainty only 15 times out of 200 responses while being wrong 134 times.
The implication for brand accuracy is direct. When an AI platform cites a source for a claim about your company, the source may not exist, may not say what the AI claims it says, or may contain outdated information that the AI presents as current. As we documented in our analysis of 1,238 AI-cited pages, 91.5% of pages are cited by only one platform, and the structural traits that predict citation have nothing to do with accuracy. Being cited does not mean being correct. Monitoring must verify both.
Semrush found 75% of consumers rate AI trust at 3-4 out of 5; 50% have purchased after AI research; 43% discovered a new brand through AI recommendations.
The accuracy data would be less urgent if consumers treated AI answers with appropriate skepticism. They do not. Semrush surveyed 1,030 U.S. consumers and found that 75% rate their trust in AI recommendations at 3 to 4 out of 5. Half have made a purchase after using AI for research. Forty-three percent discovered a new brand through AI recommendations. Sixty-nine percent expect AI to play a bigger role in their future shopping decisions. The trust-accuracy gap is the core of the problem: consumers are acting on information that is wrong 40-60% of the time as though it were reliable.
The verification data offers some nuance. Eighty-six percent of consumers say they verify AI brand recommendations at least sometimes, with Google (68%) and brand websites (48%) as the primary verification channels. But "sometimes" is not "always," and the conversion data suggests many buyers act before verifying. The Gartner consumer survey paints a more complex picture: 53% of consumers distrust AI-powered search results, and 61% wish they could toggle AI summaries off entirely.
What emerges is a bifurcated audience: a significant group that trusts AI and acts on its recommendations, and a skeptical group that distrusts AI but encounters it regardless. Both groups see AI-generated claims about your brand. The trusting group may act on incorrect information before verifying. The skeptical group may dismiss your brand entirely if the AI representation feels unreliable. Either way, inaccurate AI representation costs you.
BrightEdge found Google AI Overviews is 44% more likely than ChatGPT to surface negative brand sentiment; engines flag different brands as negative 73% of the time.
BrightEdge's March 2026 study quantified something that brand teams have suspected: AI platforms are not just inaccurate in different proportions, they are inaccurate in different directions. Google AI Overviews is 44% more likely than ChatGPT to surface negative brand sentiment (2.3% vs. 1.6% of brand mentions). More importantly, Google concentrates 85% of its negative sentiment during informational queries, the research and discovery phase where first impressions form.
| Dimension | Google AI Overviews | ChatGPT |
|---|---|---|
| Negative sentiment rate | 2.3% of mentions | 1.6% of mentions |
| Where negativity concentrates | 85% in informational queries | 19.4% at consideration-to-purchase |
| Negativity at purchase stage | 1.5% | 19.4% (13x higher) |
ChatGPT concentrates its criticism 13x more heavily near the point of purchase: 19.4% negativity at the consideration-to-purchase stage versus Google's 1.5%. The engines flag different brands as negative 73% of the time on identical queries. Jim Yu, founder and CEO of BrightEdge, stated: "For better or worse, AI is your brand's new editorialist. Each engine characterizes your brand differently, and CMOs must treat them as distinct, dynamic environments." This aligns with Sill's platform divergence findings: single-platform monitoring is a measurement error.
AirOps found 85% of AI brand mentions come from third-party pages; Bazaarvoice launched the Authentic Discovery API on April 2, 2026 to make UGC crawlable by AI.
The accuracy problem is compounded by a sourcing problem: 85% of brand mentions in AI responses come from third-party pages, not brand-owned content (AirOps). Your website, your pricing page, your product documentation: these are minority sources for AI-generated claims about your company. The majority comes from review sites, Reddit threads, YouTube videos, news articles, and forum posts. Each of these may contain outdated information, personal opinions presented as facts, or outright misinformation that AI platforms then synthesize into confident-sounding answers.
Bazaarvoice recognized this gap and launched the Authentic Discovery API on April 2, 2026, making UGC and product reviews crawlable by AI agents through server-side structured metadata delivery. Marissa Jones, SVP Product at Bazaarvoice, described the logic: "We don't just provide content; we provide signal density. Because we distribute verified human sentiment across the entire retail ecosystem at scale, we make brands' and retailers' best attributes front and center for AI models to ingest." The API covers their network of 2.3 billion monthly shoppers across 13,000+ brands and retailers.
Bazaarvoice reports that AI systems are 20-40% less likely to select products when key information is missing. The accuracy crisis is, in part, a data availability crisis: when AI platforms cannot find structured, authoritative data about a brand, they fall back on whatever unstructured third-party content they can retrieve. And that content is often wrong.
Air Canada paid damages for AI-fabricated policies; 729+ legal filings have cited AI-generated hallucinations in U.S. courts, with sanctions up to $30,000.
The Air Canada tribunal ruling (Moffatt v. Air Canada, 2024 BCCRT 149) established that companies are responsible for all information provided by their AI systems. The company's chatbot fabricated a bereavement refund policy and the airline was ordered to pay $812 CAD in damages. The dollar amount was small; the precedent was not. When an AI system speaks on behalf of your brand, you own the consequences even when the AI fabricated the claim.
The third-party platform question remains legally unsettled. When ChatGPT fabricates a product capability that a consumer relies on, liability is ambiguous. But the reputational damage is immediate and concrete. In Mark Walters v. OpenAI, ChatGPT fabricated an entire court document accusing a radio host of embezzlement by the Second Amendment Foundation. The case was dismissed in May 2025 because the plaintiff could not demonstrate actual malice and the output was only seen by one person. But the fabrication happened. The AI created something that never existed and presented it as fact.
In U.S. courts, 729+ documented cases involve AI-generated hallucinated content, with sanctions escalating: MyPillow attorneys were fined $3,000 each in July 2025; the Sixth Circuit levied $30,000 in sanctions in March 2026. When AI hallucinations enter legal proceedings, financial proceedings, or purchasing decisions, they carry real costs. The question is not whether AI will make false claims about your brand. The question is whether you will know about it when it happens.
Pages with structured data are cited 3.1x more frequently in AI Overviews; content updated within 90 days earns 67% more AI citations (SE Ranking).
The accuracy crisis is not one you can solve in a single sprint; AI platforms will continue making claims about your brand regardless of what you do. The defensive strategy is to shift the probability: increase the likelihood that AI platforms encounter accurate, current, structured information about your brand rather than outdated or third-party fragments. Three levers move that probability.
First, structured data. Pages with Schema.org markup are cited 3.1x more frequently in Google AI Overviews (BrightEdge). Schema-compliant pages see a 73% higher selection rate in AI systems. Google explicitly recommends JSON-LD for AI-optimized content. Organization schema, Product schema with current pricing, and SameAs properties for entity disambiguation all provide machine-readable facts that AI platforms can extract rather than hallucinate. This is GEO Tactic #6 in our evidence-ranked framework, and it is one of the lowest-effort, highest-impact changes brands can make.
Second, content freshness. Pages updated within 90 days earn 67% more AI citations (SE Ranking). The staleness problem is an accuracy problem: outdated content produces outdated AI answers. Keeping pricing pages, product feature pages, and comparison content current is not just a GEO tactic that is safe for SEO, it is a factual accuracy intervention.
Third, continuous monitoring. Sill's Watchdog is built for exactly this problem. It constructs a verified fact profile of your brand, extracts claims from every AI response where your brand appears, compares each claim against your facts using semantic matching, and generates alerts when AI platforms contradict your ground truth. You cannot prevent AI from making claims about your brand. You can know what those claims are and which ones are wrong.
Sill's Watchdog monitors every AI claim about your company, verifies it against your fact profile, and alerts you when platforms fabricate, outdated, or misattribute information.
Request your first analysis today to see where you stand.