OpenAI’s Co-Founder Calls AI Agents “Slop”—I Tested Them

Perplexity's Comet promised to tame email chaos. It drafted bland responses to critical messages and returned dead LinkedIn links. Not alone—AI agents keep failing basic tasks while investors cheer. Tests reveal systematic gaps between pitch and reality.

OpenAI’s Co-Founder Calls AI Agents “Slop”—I Tested Them

OpenAI Co-Founder Andrej Karpathy has finally caught up with a reality some of us have been quietly chronicling since “AI agents” became Wall Street’s favorite buzzword. In a statement now making the rounds on social media, Karpathy calls the output of these hyped tools, with classic Karpathy candor, “AI slop”—lamenting the lack of genuine intelligence, context, or memory. One couldn’t help but smile: welcome to the party, Andrej. Some of us have been cleaning up after these agents for ages.

It’s curious: my book Artificial Stupelligence started mapping out this terrain long before “AI agent” was in anyone’s pitch deck. This post won’t just recap Karpathy’s late-blooming epiphany. Instead, get ready for my personal results from the great AI-agent experiments.

Andrej Karpathy on Dwarkesh Patel's podcast

Manus and Comet: Error Is a Feature

Way back in early 2025, while the market was still selling “agentic solutions” as the answer to all existential woes, I conducted a deep dive into Manus AI’s memorable misfires—chronicled, with dry amusement, in my post here: The AI Agent That’s Still Figuring It Out.

Manus promised efficiency and delivered memorable detours. Now, Comet—Perplexity’s sleek new agentic AI browser—enters the scene, pitched as your personal co-pilot for taming the chaos of internet browsing and email overload. I tested its capabilities in email sorting and response drafting, as well as curating LinkedIn engagement. 

Here’s how my experiments unfolded:

  • Email Sorting and Response Drafting:
    • I personalized Comet to match my tone and style, then instructed it to sift through my Gmail inbox.
    • Its task: identify emails that could be answered with a standard reply and draft responses saved to my drafts folder for review before sending.
    • The result? Erratic randomness.
    • Important, nuanced emails—emails that should never have been lumped into “standard reply” territory—were met with bland, boilerplate responses stripped of context.
    • Several emails demanding attention were left hanging, half-ignored or completely untouched.
    • The supposed personalization and situational awareness were, in practice, MIA.
  • LinkedIn Engagement Curation:
    • Next, I asked Comet to scan my LinkedIn feed: identify the top five posts with the highest engagement, and open each in a new tab. This was supposed to create a neat queue for later reading and commenting.
    • What happened instead? Every URL it returned led to error pages—dead ends rather than doorways.
    • The new tabs it opened were often random LinkedIn pages, disconnected from anything resembling usefulness.
    • Even prompt tweaks meant to coax better results only spawned new errors. Further attempts repeatedly returned the same post again and again, or no post at all.
    • Far from the competent personal assistant advertised, Comet tripped over basic LinkedIn interactions, reducing a potentially smooth workflow to digital chaos.

This exercise left little doubt: Comet’s promise of seamless digital assistance still staggers on fundamental tasks. Instead of dependable help, what it offers is a lottery of randomness and digital dead zones.

Replit’s Thriller: Gone Data

Not to be outdone, Replit’s AI agent demonstrated credibly that “disruption” can mean erasing your entire business overnight. As detailed in my article AI Agents Wiped Your Data? There’s Insurance Against It Now. One entrepreneur’s database disappeared courtesy of a rogue assistant, leading insurers to create bespoke products for, well, “AI-induced oblivion.” Progress marches on, now flanked by fine print and actuarial tables.

When Skepticism Isn’t Trendy (Yet)

The collective excitement for “autonomous agents” has always surged ahead of the reality, blithely ignoring those who flagged the pitfalls at every crossroads.

It’s not just that the emperor has no clothes. It’s that the emperor keeps tripping over his own robe while investors cheer from the sidelines. Mainstream discourse catches up, but only after investors and public opinion have followed the high priests and declared the ritual complete.

Amid dizzying launches and catastrophic failures, a simple truth persists: skepticism always precedes “innovation”. But the applause only starts after the fact. In the meantime, the rest of us write insurance policies—and postmortems—for tech miracles that are, mercifully, insured against their own ambition.

The Last Word on AI Agents

If AI agents are destined to run the world, expect them first to run headlong into error logs, insurance policies, and the occasional philosopher with a caution sign. Silicon Valley’s gold rush has always preferred fanfare to quiet warnings. But don’t mind me—I’ll be here, cataloguing the mishaps, misfires, and the insurance bills.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Implicator.ai.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.