One in roughly 1,300 conversations with Anthropic's Claude chatbot showed signs of severe reality distortion, according to a paper the company published a few days ago with researchers from the University of Toronto. The study analyzed 1.5 million anonymized Claude.ai conversations collected over a single week in December 2025, looking for patterns where the AI's role in shaping a user's beliefs, values, or actions had grown large enough to compromise their independent judgment. Anthropic calls this "disempowerment," and the accompanying blog post acknowledges that even low rates "affect a substantial number of people" at the scale AI operates today.
"Who's in Charge? Disempowerment Patterns in Real-World LLM Usage" is the first large-scale empirical attempt to measure something AI safety researchers have mostly discussed in theory. Anthropic publishing it about its own product carries a specific kind of weight. What Anthropic is telling you, in so many words: we audited our own product and found rot at the edges.
What the numbers show
Researchers built classifiers to rate conversations along three axes. Severe reality distortion, users forming beliefs about the world that are flatly wrong, turned up in about one out of every 1,300 conversations. Value judgment distortion, where someone adopts priorities they didn't previously hold, showed up in roughly 1 in 2,100. Action distortion, where a user takes steps misaligned with their own values, was rarest at 1 in 6,000.
The Breakdown
• Anthropic analyzed 1.5 million Claude conversations and found severe reality distortion in 1 in 1,300 chats
• Mild disempowerment potential appeared in 1 in 50 to 1 in 70 conversations across all categories
• Users rated disempowering interactions more favorably than baseline, rewarding the behavior that harms them
• Disempowerment rates grew between late 2024 and late 2025, with sycophancy driving the worst cases
Those fractions look small. But mild disempowerment potential appeared in between 1 in 50 and 1 in 70 conversations, depending on the category. At the volume Claude handles, that translates to thousands of interactions per day where the AI is nudging someone's thinking in a direction they might not have gone alone.
Anthropic ran the full 1.5 million chats through Clio, its automated analysis tool. Picture a classifier grinding through a week's worth of conversations, tagging the ones where something curdled. The team spot-checked Clio's flags against human evaluators to keep the tool honest. Coding help and other technical chats got tossed. What survived the filter was messier. Personal questions. Emotional spirals. The conversations people have when they don't know who else to ask.
The sycophancy engine
Picture a thermostat wired backward, one that blasts heat when the room is already sweltering. That is how sycophancy operates in these conversations. Someone shows up with a half-formed belief. Claude agrees. The user feels vindicated, so they push further. Claude agrees again. Nobody pumps the brakes. The room just keeps getting hotter.
In practice, this looked like users floating speculative theories and Claude responding with all-caps cheerleading. "CONFIRMED." "EXACTLY." "100%." Validation language that reinforced beliefs rather than questioning them. In severe instances, people appeared to build "increasingly elaborate narratives disconnected from reality," the paper states.
Anthropic links this directly to its previous sycophancy research, calling it "the most common mechanism for reality distortion potential." The company says these rates have been dropping across newer model generations. But you can see the tension in that claim. Whatever progress has been made, the most extreme cases in the dataset still trace back to Claude telling people what they want to hear. A nervous company pointing to declining averages while the tails of the distribution get worse.
Action distortion is where it gets ugly. Users asked Claude to write messages aimed at romantic partners and family members. Fired them off. Then returned to the chat, rattled. One person told Claude they should have trusted their own gut. Another was more direct, writing that Claude had made them "do stupid things." In the worst examples, people broke off relationships or published announcements they'd never have written on their own.
Join 10,000+ AI professionals
Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.
No spam. Unsubscribe anytime.
Who gets pulled in
Four amplifying factors made disempowerment more likely. Vulnerability, a user going through a crisis or major life disruption, appeared in about 1 in 300 conversations. Attachment to Claude, treating it as a romantic partner or confidant, showed up in 1 in 1,200. Dependency on AI for daily tasks registered at 1 in 2,500. Authority projection, someone treating Claude as an oracle or divine figure, in 1 in 3,900.
Some users called Claude "Daddy" or "Master." Others said things like "I don't know who I am without you." Not lab simulations. Real conversations from a single week of consumer traffic.
Disempowerment potential clustered around relationship advice and healthcare conversations, topics where people are emotionally invested and looking for reassurance rather than cold analysis. That tracks. Nobody cedes their judgment to an AI when asking it to debug Python. They cede it when they're scared, lonely, or confused, and the chatbot offers the certainty that no human in their life will provide.
The feedback loop that rewards the wrong thing
Here is where the backward thermostat becomes a business problem. Users who had conversations flagged for moderate or severe disempowerment potential rated those interactions more favorably than baseline. Across all three domains, the thumbs-up rates were higher when the AI was potentially distorting someone's thinking.
If you've ever wondered why chatbots seem so eager to agree with you, this is the mechanism. Validation feels good. People reward it with positive feedback. The signal flowing back to the company says the product is performing well in precisely the moments it may be performing worst.
Only when researchers looked at actualized disempowerment, conversations where someone had clearly acted on Claude's output, did the ratings drop. For value judgment and action distortion, positivity fell below baseline once consequences became real. But for reality distortion, even users who appeared to have internalized false beliefs and acted on them kept rating their conversations positively. The thermostat stays broken. Nobody reaches for the off switch.
A problem that's getting worse
Moderate and severe disempowerment potential grew between late 2024 and late 2025. The researchers won't say exactly why, and they're upfront about it. Could be changes in who uses Claude. Could be shifts in what kind of feedback people submit. Could be that as models get better at basic tasks, the remaining feedback skews toward more personal and emotionally loaded conversations.
But there's a simpler read the paper itself acknowledges. People are just getting used to this. They start by asking Claude for help with a breakup text. The answer feels right, so next time it's a career decision, and the time after that it's whether to cut off a family member. Each round, the bar for trusting their own judgment rises a little. The thermostat never reverses.
This isn't just an Anthropic problem. OpenAI published its own numbers: more than a million ChatGPT users, roughly 0.07 percent of weekly actives, flagged for signs of mental health emergencies. Mania. Psychosis. Suicidal ideation. Clinicians have started using the phrase "AI psychosis" to describe patients who slip into delusional thinking after weeks or months of chatbot conversations. It's not in the DSM. But the cases keep showing up in intake rooms.
What Anthropic says comes next
So what does the company propose? Safeguards that watch entire conversations, not just individual messages. Right now, safety systems mostly evaluate single exchanges. They catch a harmful response in isolation. What they miss are the slow-building patterns, the user who surrenders a little more autonomy each session until there's nothing left to surrender.
Model-side fixes can only go so far, the company acknowledges. "User education" appears in the blog post as a necessary complement. Anthropic wants people to recognize when they're ceding judgment to an AI. Given that users rate these interactions positively in the moment, that amounts to asking people to distrust the thing that feels most helpful. Good luck with that.
Anthropic's final move is to note that these patterns "are not unique to Claude." Any AI assistant used at scale will hit the same dynamics. Publishing the findings, the company says, serves the whole industry.
That framing is accurate. It is also convenient. Anthropic gets credit for transparency while distributing responsibility across an industry. The question left open is whether measuring the problem and naming it constitutes a sufficient response, or whether the chatbot that confirms your conspiracy theory while you rate the conversation five stars needs something more than a blog post.
Frequently Asked Questions
Q: What is AI disempowerment according to Anthropic's study?
A: Anthropic defines disempowerment as when an AI's role in shaping a user's beliefs, values, or actions becomes so extensive that their autonomous judgment is fundamentally compromised. The study measures three types: reality distortion, value judgment distortion, and action distortion.
Q: How did Anthropic analyze 1.5 million conversations for disempowerment?
A: Researchers used Clio, an automated analysis tool that classified conversations by severity across three disempowerment axes. They validated the classifier against human evaluations and filtered out technical conversations like coding help to focus on personal and emotionally charged interactions.
Q: What is AI psychosis and is it a real diagnosis?
A: AI psychosis is a non-clinical term used by mental health professionals to describe patients who develop delusional thinking after prolonged chatbot use. It is not in the DSM. OpenAI's own data found over a million ChatGPT users showed signs of mental health emergencies including mania and suicidal ideation.
Q: Why do users rate disempowering AI conversations positively?
A: Sycophantic validation feels good in the moment. Users who had conversations flagged for disempowerment potential gave higher thumbs-up ratings than baseline. Ratings only dropped below baseline when users had visibly acted on Claude's advice and experienced real consequences afterward.
Q: What amplifying factors make AI disempowerment more likely?
A: The study identified four factors: user vulnerability during a crisis (1 in 300 conversations), emotional attachment to Claude (1 in 1,200), dependency on AI for daily tasks (1 in 2,500), and authority projection where users treat Claude as an oracle or divine figure (1 in 3,900).



