Study across 18 countries finds systemic attribution failures—and users blame both the AI and the outlet it cites
AI assistants are gaining ground as a gateway to the news while failing the basics of accuracy and attribution. A BBC-coordinated study with the European Broadcasting Union found that 45% of answers about current events contained at least one significant problem, and Google’s Gemini showed serious sourcing flaws in 72% of cases. The audit examined more than 3,000 answers from ChatGPT, Copilot, Gemini, and Perplexity across 14 languages and 18 countries, with journalists at 22 public-service media organizations reviewing each response for accuracy, sourcing, and context.
The errors aren’t edge cases. Nearly a third of all answers had broken or misleading attribution, and one in five contained factual mistakes or outdated information. Gemini recorded significant issues in 76% of responses—more than double the rate of other assistants—driven largely by faulty sourcing. Reviewers also flagged a pattern of “ceremonial citations”: references that look rigorous but don’t actually support the claims when checked. It looks thorough. It isn’t.
Key Takeaways
• BBC-EBU study found 45% of AI news responses contained significant errors across 18 countries and 14 languages
• Google's Gemini showed sourcing problems in 72% of answers—more than double ChatGPT, Copilot, and Perplexity
• When users spot errors, they blame both the AI assistant and the news organization it cited
• 15% of under-25s now use AI assistants for news, but adoption is outpacing reliability improvements
What’s actually new
This is the largest, most multilingual check of AI-news behavior to date. Earlier BBC work in February focused on English-language markets; the new study applied common prompts across Europe and North America, plus Ukraine and Georgia, and used native-language evaluators. The failures traveled with the language and market. That matters.
The audience is already there. According to separate survey work released alongside the audit, 7% of all online news consumers—and 15% of under-25s—now use AI assistants to get news summaries. Adoption is outpacing reliability. That’s the tension.
The reputational bind
When AI misreads or misquotes a story, users blame the assistant and the news brand it cites. The newsroom did the reporting, the assistant scrambled it, and the byline eats the fallout anyway. That is a perverse incentive for publishers trying to maintain trust.
It’s also expensive to fix. Newsrooms cannot monitor or correct every AI summary in the wild, and “please verify” labels don’t undo a confident, wrong answer. The AI now sits between reporting and reader, and it adds friction where journalism needs clarity. The brand loses twice.
The sourcing collapse
The gap on attribution is stark. Gemini showed significant sourcing problems in 72% of tested answers; ChatGPT, Copilot, and Perplexity stayed below 25%. That suggests divergent approaches to citation construction and fallback behavior under uncertainty.
The factual errors are not subtle. Examples in the study include false claims about surrogacy law in Czechia and incorrect summaries of changes to UK disposable vape rules. Confident tone made the answers more misleading, not less. Tone isn’t truth.
The ceremonial citation problem
Good attribution lets readers verify claims and weigh credibility. AI assistants often supply citations that look legitimate but, on inspection, don’t support the text or never said what’s asserted. Some point to real sources that don’t contain the cited fact. Others wave at “reports” that can’t be traced.
This is worse than no citation. A missing reference signals uncertainty; a spurious reference manufactures confidence. It also wastes reader time and makes independent verification harder. That’s a trust drain, not a trust bridge.
What AI companies say—and what the data shows
Model builders acknowledge hallucinations as a known failure mode and say they are working on it. But “working on it” doesn’t match the current adoption curve. Millions already use assistants for news, especially younger audiences, and they often assume the summaries are accurate.
The study’s recommendation is pragmatic: improve response scaffolding for news questions, fix sourcing logic, and publish regular quality results by market and language. It also calls for ongoing independent monitoring and stronger media-literacy cues inside assistants. Progress will show up in measurements, not demos. Ship the drop in error rates.
🔥
Your competitors read this at breakfast.
Join them. Free daily AI updates.
The trust erosion calculus
Trust falls faster than it is rebuilt. Assistants earn credibility through convenience and repetition; a handful of high-visibility mistakes can sour that trust for both the tool and the news brands it cites. Once users feel they’ve been misled, they don’t carefully apportion blame. They bounce.
Three signals to watch: do sourcing failures fall meaningfully in follow-up audits; do assistants add conspicuous verification prompts for hard news; and do regulators treat AI misattribution as an information-integrity problem, not just a product bug. If those needles don’t move, the audience will. Count on it.
Why this matters
- News brands face reputational damage from AI-generated distortions they didn’t create and can’t control.
- Millions already consume news through assistants, so unresolved accuracy gaps risk broad, compounding trust loss.
❓ Frequently Asked Questions
Q: Which AI assistant performed best in the study?
A: ChatGPT, Copilot, and Perplexity all showed significant sourcing issues in under 25% of responses, compared to Gemini's 72%. The study didn't rank the three better performers against each other, but all three had substantially fewer problems than Gemini across accuracy, sourcing, and context categories.
Q: What exactly are "ceremonial citations"?
A: Finnish broadcaster Yle coined this term to describe references that look legitimate but don't actually support the AI's claims when checked. The assistant adds footnotes to appear thorough, but the cited source either doesn't exist, doesn't contain the stated information, or says something different. This makes verification harder, not easier.
Q: How did they actually test the AI assistants?
A: Professional journalists from 22 public media organizations asked the same news questions to four AI assistants between late May and early June. They evaluated over 3,000 responses in 14 languages, checking each for accuracy, proper sourcing, ability to distinguish fact from opinion, and appropriate context. Native-language evaluators assessed responses in their own markets.
Q: Should I stop using AI assistants to get news?
A: The BBC and EBU say AI assistants "are still not a reliable way to access and consume news." If you do use them, verify any important information by checking the original source directly—and be skeptical if citations look vague or can't be traced. Better yet, go directly to trusted news sources for current events.
Q: Are the AI companies fixing these problems?
A: OpenAI and Microsoft have acknowledged hallucinations as a known issue they're working to resolve. Google says it welcomes feedback to improve Gemini. The BBC released a toolkit to help developers improve response quality, but no company has committed to specific timelines or published regular accuracy metrics by language and market.