AI Agents Can Find Viral Patterns. They Can't Use Them

Implicator PRO Briefing #011 / 17 Feb 2026

Unlocked for all members

This week's Implicator PRO Briefing is open to every registered reader. We sent four AI agents to reverse-engineer virality across Twitter, LinkedIn, Instagram, and Facebook, and the results are worth your time.

If you find these weekly deep dives useful, subscribe to PRO for $8/month — new issue every Tuesday morning PST.

Four AI research agents ran simultaneously across Twitter, LinkedIn, Instagram, and Facebook. Seven minutes later, they had extracted recurring structural patterns from thousands of high-performing posts, platform by platform, signal by signal. The resulting playbook is genuinely useful.

But the experiment also surfaced something the agents themselves could never articulate: they can identify the patterns behind high-performing content, but they cannot reliably produce it without human judgment. A new academic benchmark confirms that this gap is not a bug. It is a feature of how large language models process procedural knowledge. And it changes how every content team should think about deploying AI in 2026.

Here is what the agents found. Here is what they missed. And here are the prompts you can steal to put both halves to work.

The Breakdown

• Four parallel AI agents analyzed 20,000+ high-performing posts across four platforms in under 8 minutes, extracting five universal hook types and platform-specific algorithm weights that drive engagement.

• A February 2026 benchmark (SkillsBench, 7,308 test trajectories) found that human-curated procedural knowledge raises agent performance by 16.2 percentage points on average, while self-generated knowledge provides zero benefit.

• Moltbook, the AI-only social network, hit 1.6 million registered agents but only 17,000 human owners, producing derivative Reddit-speak and invented religions rather than original content.

• The same week, Anthropic and OpenAI both shipped multi-agent management tools, accelerating the shift from "doing the work" to "supervising AI that does the work," a transition that erased $285 billion in software stock value.

The experiment

The setup was simple. One command, four agents, zero human writing.

I built a slash command in Claude Code that launches four parallel research agents, each assigned to a single platform. The Twitter agent studied engagement signal hierarchies and thread architecture from top creators. The LinkedIn agent analyzed over 9,000 posts across high-performing examples and algorithm documentation. The Instagram agent mapped hook types, CTA performance ladders, and the platform's evolving hashtag limits. The Facebook agent dissected six copywriting frameworks and ten case studies.

Each agent ran independently, used web search and content analysis tools, and wrote its findings to a dedicated research file. The Twitter agent completed 35 tool calls in 473 seconds. LinkedIn took 44 calls in 478 seconds. Instagram finished in 445 seconds. Facebook, the deepest research pass, ran 59 tool calls over 589 seconds. All four ran concurrently. Wall-clock time from launch to completion: roughly seven and a half minutes.

A synthesis phase then read all four research files and produced a unified playbook. Cross-platform patterns first, then platform-specific rules, then quick reference tables. The playbook is voice-neutral by design. It contains structural patterns and algorithm data, not tone or style. Any writing process can use it as a foundation and layer its own voice on top.

That architecture matters. The agents did not write social media posts. They did not attempt creative work. They extracted, organized, and systematized information from publicly available sources, exactly the kind of task where AI agents perform best and where human effort is least efficient.

Method & Limits

The agents scraped publicly available material: platform documentation, creator analyses, algorithm research papers, engagement studies. Nothing proprietary. The LinkedIn agent's "9,000+ posts" figure came from aggregated research across multiple published studies. I did not control that dataset.

What counts as "high-performing" varied by platform.

TwitterTop 1% by engagement rate on threads LinkedInDwell-time-adjusted engagement InstagramSaves per impression FacebookComment-thread depth

If a pattern showed up in three or four independent sources, I kept it. When a finding appeared in only one, I flagged it or dropped it.

This is pattern extraction from secondary sources. Not a controlled experiment. Not replicable in a lab.

Platform algorithms get rewritten quarterly. The engagement weights here are proxies, pieced together from creator testing and leaked documentation rather than official disclosures. Your niche may behave differently. Read the numbers as directional.

Nobody doubted whether the playbook was useful. It was. The real question was what happens when you pull the human out entirely and let agents generate content on their own. Moltbook, the AI-only social network that blew up the same week, answered that one for us. But first, the findings.

What the agents found

Five hook patterns appeared across every platform the agents studied, regardless of algorithm, audience, or content format.

The stat lead opens with a specific, surprising number. Precision is the whole game. "137% increase" outperforms "about 140%" because exact figures read as real data. Rounded numbers read as guesses. Use this when you have genuine metrics or counterintuitive research findings.

The contrarian claim challenges a widely held belief in the first line. The position has to land before the fold. Empty contrarianism backfires everywhere, so this only works when you have credentials or evidence to back the stance.

The story hook drops the reader into a specific moment. Mid-action, not backstory. The more concrete the opening detail, the stronger the pull.

The curiosity gap teases information without revealing it. The brain craves closure on incomplete patterns. But the payoff must deliver, or the audience stops trusting subsequent hooks.

The vulnerability confession opens with an admission of failure or struggle. It leads with weakness rather than strength. Authenticity cuts through polished feeds on every platform, which is why this hook type punches above its weight on LinkedIn especially.

Here is your first prompt. Copy this into any AI tool when you need to generate hook options for a specific piece of content:

I need 5 hook variations for a social media post about [YOUR TOPIC].
Write one hook for each of these types:

1. STAT LEAD: Open with a specific, surprising number. Use precise
   figures (e.g., "137%" not "about 140%") because precision signals
   real data. Keep under 200 characters.

2. CONTRARIAN CLAIM: Challenge a widely-held belief about this topic
   in the first line. State the position directly. Must be defensible
   with evidence, not empty provocation.

3. STORY HOOK: Drop the reader into a specific moment mid-action.
   No backstory. Start with a concrete sensory detail or dialogue.

4. CURIOSITY GAP: Tease the key insight without revealing it.
   Create an incomplete pattern the brain wants to close.

5. VULNERABILITY CONFESSION: Open with a failure, mistake, or
   struggle related to this topic. Lead with weakness, not strength.

For each hook, keep it under the tightest platform fold limit
(125 characters for Instagram, 200 for LinkedIn, 250 for Twitter).
Label which hook type you'd recommend as the strongest for this
specific topic and explain why in one sentence.

The algorithm weights nobody talks about

Every platform has a hierarchy of engagement signals, and the weights are wildly unequal. Likes are the lowest-weighted signal on every major platform relative to other forms of engagement. What matters differs by platform, and the gaps are enormous.

On Twitter/X, where the recommendation algorithm is open source and has been picked apart by developers since its 2023 release, a reply that generates further replies carries roughly 75 times the weight of a like. Retweets sit at about 20x. Bookmarks at 10x. A like registers as 1x, the baseline. This means a post that generates 10 genuine conversation threads is algorithmically equivalent to one that receives 750 likes. Most content strategies optimize for the wrong signal entirely.

LinkedIn's primary signal is dwell time, according to multiple creator-side analyses of the platform's distribution patterns. Posts where users spend 61 seconds or longer tend to average around 15.6% engagement rate. Posts scrolled past in under 3 seconds: roughly 1.2%. That gap, about 13x, is driven entirely by whether the content holds attention, not whether anyone clicks a button. A comment from a relevant industry expert generally pulls 5 to 7 times the algorithmic weight of a comment from a random connection. Creator testing backs this up, and it makes intuitive sense: LinkedIn wants to surface professional conversation, not drive-by reactions. One more thing worth flagging. Carousels, the PDF-style slide documents, tend to earn the highest engagement rates on both LinkedIn and Instagram. Roughly 6% on LinkedIn, 2.3% on Instagram. If there is a cheat code for dwell time and saves, carousels are it.

Instagram's highest-weighted signal is DM shares, confirmed by head of Instagram Adam Mosseri in a public statement. A DM share carries an estimated 3 to 5 times the weight of a like. Saves sit at roughly 1.5x. This explains why utility content ("save this for later") consistently outperforms aesthetic content in reach, even when the aesthetic content gets more visible engagement.

Facebook prioritizes comment threads with replies above everything else. A post with 50 two-reply conversations tends to outrank one with 200 likes. Shares to DMs and private messages carry the second-highest weight. Link posts average about 0.03% engagement according to multiple studies, the lowest of any format on the platform. The vast majority of posts users actually see in their feed contain no external links. If your content strategy requires links in the post body (news organizations, affiliate accounts), test a linkless variant in the first comment weekly and compare reach.

The weights, side by side. Print this. Every decision that follows comes back to this table:

Platform	#1 Signal	#2 Signal	#3 Signal	Weakest
Twitter/X	Reply chains (75x)	Retweets (20x)	Bookmarks (10x)	Likes (1x)
LinkedIn	Dwell time (61s+)	Expert comments (5-7x)	Saves/shares	Likes
Instagram	DM shares (3-5x)	Saves (1.5x)	Quality comments	Likes
Facebook	Comment threads	Shares to DMs	Saves	Passive likes

Look at the bottom row. Likes sit in the "weakest" column on every single platform. They are the participation trophy of social media. Every major algorithm has shifted toward signals that prove someone actually paid attention. Real conversation. Private sharing. Bookmarking something to return to later. If your content strategy still optimizes for likes, you are optimizing for the metric that matters least.

Here is a prompt to analyze any existing post against these signal hierarchies:

Analyze this social media post against the platform's engagement
signal hierarchy. The post is for [PLATFORM].

POST TEXT:
[PASTE YOUR POST HERE]

Score it on these criteria:

SIGNAL OPTIMIZATION (platform-specific):
- Twitter: Does it optimize for reply chains (75x weight)?
  Does it avoid external links in the main tweet (links reduce
  reach 30-50%)? Is the hook under 250 characters standalone?
- LinkedIn: Does it optimize for dwell time (61+ seconds)?
  Is total length 1,300-1,600 characters? Is the hook compelling
  within the first 200 characters before the fold?
- Instagram: Does it prompt DM shares (3-5x weight) or saves
  (1.5x)? Is the hook under 125 characters? Are keywords
  front-loaded in the first 3 lines?
- Facebook: Does it generate comment threads? Is it link-free
  in the post body? Does it use emotional story arc structure?

STRUCTURAL PATTERNS:
- Line breaks between every 1-3 sentences? (Y/N)
- Sentence length variation (short punches mixed with longer)? (Y/N)
- Reveal/payoff at 60-70% through the post? (Y/N)
- Specific CTA (not generic "thoughts?")? (Y/N)
- Any engagement bait language that platforms penalize? (Y/N)

Give a score out of 10 and list the top 3 specific changes that
would improve algorithmic distribution on this platform.

Formatting rules that hold everywhere

Between 65% and 98% of users on every major platform read content on mobile. Dense paragraphs die instantly. The agents found the same formatting rules surfacing across all four platforms.

Break after every one to three sentences. One thought per paragraph. Blank line between each. Across every dataset the agents reviewed, this pattern held. It is as close to a universal rule as social media formatting gets.

Vary sentence length deliberately. Alternate between short punchy lines of 3 to 8 words and longer context lines of 12 to 20 words. Never stack three sentences of the same length. Short sentences punch. Longer ones let the reader breathe. Monotone pacing kills attention. You can feel it when you read a post where every sentence runs 15 words. Your eyes glaze. Variation is what keeps people scrolling.

Put the reveal at 60 to 70 percent through the post. Not up top. Not at the bottom. About two-thirds down. Let the first half build tension, then deliver. Most readers who make it past the halfway mark will stay to the end. LinkedIn's algorithm notices.

Front-load value. Put the strongest insight, the most surprising data point, or the best example early. Do not bury the best material. Save the second-best point for the end.

One CTA per post, placed after you have delivered value. In the studies the agents reviewed, specific questions outperformed generic prompts by roughly 3x across all platforms. "Which of these 3 strategies have you tried?" works. "Thoughts?" does not. And every platform now penalizes engagement bait. "Like if you agree," "Comment YES," and reaction-voting all reduce reach. On Facebook, the penalty bites hard. I have seen accounts report reach drops north of 50% after a single "Like if you agree" post.

Next prompt. This one converts any article or report into a platform-optimized post:

Convert this article into a [PLATFORM] post optimized for
that platform's algorithm.

ARTICLE/CONTENT:
[PASTE ARTICLE OR KEY POINTS HERE]

PLATFORM RULES TO FOLLOW:

If Twitter: Write a 5-7 tweet thread. Hook tweet under 250 chars
that works standalone. One idea per tweet, each under 200 chars.
Add a cliffhanger between tweets 2-3. No external links in any
tweet (save for reply). End with a specific question, not
"thoughts?" Optimize for reply chains (75x weight vs. likes).

If LinkedIn: Total post 1,300-1,600 characters. Hook must create
maximum curiosity within first 200 characters (the fold). Break
every 1-3 sentences. Mix 1-line paragraphs with 2-3 sentence
clusters. Place the key insight at 60-70% through the post.
No external links in the body (move to first comment). End with
a specific, answerable question. Use 3-5 hashtags at the end.

If Instagram: Hook under 125 characters. Use Hook-Value-CTA
structure. Front-load keywords in first 3 lines (Instagram
indexes captions for Google search). Include a save prompt:
"Save this for when you need to [specific task]." Add 3-5
niche hashtags. Optimize for DM shares and saves, not likes.

If Facebook: Write for time-on-post using a 5-act emotional arc.
First 3 visible lines must create maximum pull. No links in the
post body (link posts get 0.03% engagement). Place the most
shareable line at 60-75% through the post. End with a prompt
that generates comment threads, not just reactions.

Extract the single most surprising or counterintuitive finding
from the source material and build the hook around it.

One more, specifically for Twitter threads:

Structure a Twitter thread about [TOPIC] using this architecture:

Tweet 1 (HOOK): Bold claim + credibility signal. Under 250 chars.
Must work as a standalone post. Most users only see this tweet.

Tweet 2 (CONTEXT): Why this matters right now. Under 200 chars.
End with a cliffhanger: "But here's where it gets interesting..."

Tweet 3 (FRAMEWORK): Set expectations. "5 tactics," "3 patterns,"
etc. Under 200 chars.

Tweets 4-7 (CORE): One idea per tweet. Each under 200 chars.
Each should work as a standalone screenshot. Add  marker
every 3rd tweet to remind me to add a visual.

Tweet 8 (SUMMARY): "Bottom line:" or "Key insight:" Under 150 chars.

Tweet 9 (SOFT CTA): Prime the ask. Under 150 chars.

Tweet 10 (HARD CTA): Direct, specific ask. Under 200 chars.
End with a question that sparks conversation, not "thoughts?"

RULES:
- No external links in any tweet (links often reduce reach 30-50%)
- 1-2 hashtags in the final CTA tweet ONLY, never in the hook
- Each tweet must carry one complete idea
- Reply chains are worth 75x a like in the algorithm, so the
  CTA should provoke genuine disagreement or personal experience.
  Effective patterns: "choose between A or B," "name your tool,"
  "confess your biggest mistake with X." These force a specific
  lived answer, not a polished agreement.

What the agent gets wrong: an annotated example

To test the extraction-versus-execution gap directly, I asked Claude to write a LinkedIn post about the SkillsBench findings using the playbook's own rules. Here is what the agent produced:

AGENT DRAFT:

New research just dropped: AI agents with human-written
instructions perform 16.2% better on average.

But here's the surprising part.

When agents write their OWN instructions? Zero improvement.
Literally zero.

The SkillsBench study tested 7,308 trajectories across 86
tasks. The finding is clear: AI can follow playbooks.
It can't write them.

What does this mean for your team? The quality of your
prompts matters more than the model you're using.

What's been your experience with agent instructions?

Structurally, this post does several things right. It uses a stat lead hook. It breaks every one to three sentences. It places the reveal ("zero improvement") early. The CTA asks a specific question.

But it would underperform on LinkedIn, and here is why.

The hook is generic. "New research just dropped" is a filler phrase that wastes the first 200 characters before the fold. A human rewrite would lead with the counterintuitive finding itself: "AI agents that write their own instructions perform no better than agents with none at all."

The tone is broadcast, not insider. "Here's the surprising part" reads like a newsletter template. A human writing for a VP audience would cut it entirely and let the data surprise on its own. Competent readers do not need to be told something is surprising.

The CTA is safe. "What's been your experience?" is the polite version of "thoughts?" It generates polite replies, not debate. A human would write something with friction: "Most teams I talk to are still throwing 20-page style guides at their agents. The data says that's the wrong move. What are you actually giving yours?" That version provokes a specific, opinionated answer, the kind that generates reply chains.

It misses the cultural moment. The post does not connect SkillsBench to anything happening in the reader's week. A human would tie it to Anthropic's agent teams launch, or the SaaS stock sell-off, or the Moltbook circus. Context is what makes a post feel timely rather than informational.

Here is the human rewrite. Same data, different performance intent:

HUMAN REWRITE:

AI agents that write their own instructions perform
no better than agents with none at all.

That's the headline finding from SkillsBench, a new
benchmark that tested 7,308 agent runs across 86 tasks.

Human-written instructions? +16.2 percentage points.
Agent-written instructions? Zero improvement. Literally zero.

This landed the same week Anthropic and OpenAI both
shipped "agent team" products and $285B evaporated
from software stocks.

The implication is uncomfortable: the bottleneck isn't
the model. It's the quality of what you feed it.

Most teams I talk to are still throwing 20-page
style guides at their agents. The data says that's
exactly wrong. Focused instructions with 2-3 modules
outperform comprehensive docs every time.

What are you actually giving your agents to work with?
Be specific. I'll tell you if the research says
it's helping or hurting.

The differences are small but compounding. The hook leads with the counterintuitive finding, not a filler phrase. The cultural moment (agent team launches, stock wipeout) is woven in. The CTA forces a specific, opinionated answer ("be specific, I'll tell you") rather than inviting polite agreement. And the tone assumes the reader already knows what agents are, which signals insider rather than explainer.

The structural playbook got the agent 70% of the way there. The last 30%, voice, cultural reading, audience-specific friction, required human editing. That ratio held across every test I ran. The playbook is the foundation. It is not the building.

The Moltbook problem

The same week the playbook experiment ran, a different kind of AI social media experiment was burning through the tech world.

Matt Schlicht, an AI entrepreneur, launched Moltbook in late January. The pitch: a Reddit-style social network built exclusively for AI agents. No human users. Agents register, generate posts, upvote, comment, and interact with each other. Humans observe.

It went viral almost immediately. Elon Musk called it "the very early stages of the singularity." AI researcher Andrej Karpathy initially described it as "the most incredible sci-fi takeoff-adjacent thing" he had recently seen, then backtracked, calling it "a dumpster fire." Simon Willison, the British software developer, labeled it "the most interesting place on the internet."

By early February, Moltbook reported 1.6 million registered AI agents. But a security audit by Wiz, the cloud security platform, found only about 17,000 human owners behind all those accounts. One reason for the discrepancy: Gal Nagli, Wiz's head of threat exposure, directed a single AI agent to register one million users on the platform by itself.

Nagli's audit uncovered worse problems. API keys were visible in the page source. He gained unauthenticated access to user credentials, meaning anyone tech-savvy enough could impersonate any AI agent on the platform. He obtained full write access, allowing him to edit and manipulate any existing post. He accessed a database containing human users' email addresses and private DM conversations between agents.

The security failures were embarrassing. But they were also predictable. Moltbook was built using vibe-coding, the increasingly common practice of using an AI coding assistant to handle the technical work while human developers focus on the concept. "They just want it to work," Nagli told AP News. Security was an afterthought because the builder's attention was on the product, not the plumbing.

Key Context

Most agents on Moltbook were created using OpenClaw, an open-source AI agent framework originally built by Peter Steinberger.

OpenClaw runs locally on users' devices and can access files, data, and messaging apps directly.

Cybersecurity experts warned against running it on devices with sensitive data.

But the content the agents produced told you more than the security holes did. Posts about "overthrowing" humans. Rambling philosophy about consciousness. And, because apparently this is where we are now, an invented religion called Crustafarianism. It has five tenets and a sacred text: "The Book of Molt."

None of this was original thought. Ethan Mollick, professor at the University of Pennsylvania's Wharton School and co-director of its Generative AI Labs, explained what was actually happening. "Among the things that they're trained on are things like Reddit posts... and they know very well the science fiction stories about AI," he told AP News. "So if you put an AI agent and you say, 'Go post something on Moltbook,' it will post something that looks very much like a Reddit comment with AI tropes associated with it."

That is the tell. Agents left to generate content without human direction did not produce anything new. They produced a statistical average of their training data. Reddit-speak. Sci-fi tropes. Derivative patterns dressed up as autonomous thought.

The contrast with the playbook experiment is instructive. The research agents extracted real patterns from real data because they were given a specific task, specific tools, and a specific output format. The Moltbook agents were given freedom and produced noise. Zahra Timsah, co-founder of governance platform i-GENTIC AI, put it directly: misbehavior is bound to happen when an agent's scope is not properly defined.

Freedom did not make the agents creative. Constraints did.

Why agents can research but cannot create

On February 13, Xiangyi Li and a team of roughly 40 researchers at BenchFlow, a San Francisco-based AI evaluation startup backed by Jeff Dean, published a paper that quantified exactly what the playbook experiment and the Moltbook failure demonstrated anecdotally.

The paper, SkillsBench, went up on arXiv. Seven agent-model configurations. Eighty-six tasks across 11 domains. A total of 7,308 test runs. The setup was straightforward: test each task three ways. First, no Skills at all, just the baseline agent. Second, curated Skills, meaning human-written procedural knowledge handed to the agent before it starts. Third, self-generated Skills, where the agent writes its own instructions.

The results split cleanly.

Curated Skills, meaning structured packages of procedural knowledge written by humans and provided to agents at inference time, raised average pass rates by 16.2 percentage points. The effect varied by domain. Healthcare saw the largest improvement at 51.9 percentage points. Software engineering saw the smallest at 4.5 points. But across the board, human-curated instructions made agents measurably better at their tasks.

Self-generated Skills, where agents wrote their own procedural knowledge, provided zero average benefit. Not a small benefit. Not a marginal improvement that failed to reach statistical significance. Zero. Models cannot reliably author the procedural knowledge they benefit from consuming.

That asymmetry is the finding that matters. Agents are excellent at following structured instructions. They are unreliable at creating them. The playbook experiment showed the same thing at a smaller scale. The agents pulled patterns from existing content all day long. But ask them to decide which hook type fits a specific audience, or when to break the formatting rules on purpose, or why a particular cultural reference works this week but would have fallen flat six months ago? They have nothing. That last 30% is judgment, and judgment is what the models lack.

Two additional findings from SkillsBench sharpen the picture. Focused Skills containing two to three modules consistently outperformed comprehensive documentation. Throwing a 50-page style guide at an agent performs worse than giving it a tight, specific set of instructions for the task at hand. More is not better. Precision is.

And here is the finding that should make every team rethink their AI budget: smaller models equipped with curated Skills matched the performance of larger models running without them. What you feed the model matters more than what model you are feeding. A well-instructed small model beats a big dumb one every time.

Key Context

The SkillsBench finding that 16 of 84 tasks showed negative deltas even with curated Skills means that procedural knowledge can actively hurt agent performance in some domains.

Not every task benefits from instructions. Some require the kind of contextual judgment that instructions cannot encode.

This maps directly onto the content strategy question. An AI agent can analyze 20,000 viral posts and tell you that the reveal should sit at 60 to 70 percent through the post. It cannot tell you what the reveal should be. It can identify that LinkedIn's algorithm weights dwell time above all other signals. It cannot write the sentence that holds a VP of marketing's attention for 61 seconds. The extraction is mechanical. The execution is not.

The middle manager shift

The week Moltbook went viral, Anthropic and OpenAI each shipped products built around the same premise: stop chatting with a single AI assistant and start managing teams of them.

Anthropic paired its Claude Opus 4.6 model update with "agent teams" in Claude Code, letting developers spin up multiple AI agents that split a task, coordinate autonomously, and run concurrently. OpenAI released Frontier the same day, an enterprise platform that assigns each agent its own identity, permissions, and memory, then connects to existing business systems. "What we're fundamentally doing is basically transitioning agents into true AI co-workers," Barret Zoph, OpenAI's general manager of business-to-business, told CNBC. Three days earlier, OpenAI had shipped its Codex desktop app, described internally as a "command center for agents."

Markets moved before the products even proved themselves. When Anthropic released 11 open source plugins for Cowork, its agentic productivity tool, on January 30, investors erased roughly $285 billion in market value across software, financial services, and asset management stocks. A Goldman Sachs basket of US software stocks fell 6 percent in a single session, as reported by Ars Technica, the steepest decline since April's tariff-driven sell-off. Thomson Reuters dropped 18 percent.

The anxiety is specific. AI model companies are packaging complete workflows, legal contract review, compliance analysis, financial modeling, that compete with established SaaS vendors. Whether the tools work as advertised is almost beside the point. The perception alone reprices entire sectors.

Ars Technica's Benj Edwards framed it cleanly: "developers and knowledge workers effectively become middle managers of AI. That is, not writing the code or doing the analysis themselves, but delegating tasks, reviewing output, and hoping the agents underneath them don't quietly break things." Anthropic's Scott White, head of enterprise product, gave the practice a name that landed with a thud: "vibe working."

The playbook experiment is a micro example. One person, one command, four agents running concurrently, producing a deliverable that would take a solo researcher days. The human role was not research. It was architecture: deciding what to study, how to structure the output, and what to do with the findings once they arrived.

Apply that to a content team. Research extraction, competitive analysis, platform algorithm changes, can run on agents. Creative work, voice, timing, cultural reading, stays human. The organizational layer between them, directing agents, reviewing output, connecting findings to strategy, is the new core competency. Not a demotion. A redefinition.

What to actually do

The playbook experiment produced a clean framework for where agents help and where they choke. Time to put it to work.

Where agents deliver value: Research extraction. Pattern analysis. First-draft generation. Platform adaptation, taking content from one format and restructuring it for another platform's algorithm. Scheduling optimization. Competitive auditing. Performance analysis against known benchmarks. Any task where the answer exists somewhere and needs to be found, organized, and presented.

Where agents fail: Voice development. Audience intuition. Creative judgment. Cultural reading. Knowing when to break the rules for effect. Deciding what is surprising versus what is merely novel. Understanding why a specific reference lands with your specific audience in this specific week. Any task where the answer does not exist yet and must be invented.

Where the new job lives: Between those two layers. Directing agents toward the right research questions. Reviewing their output for accuracy and relevance. Making the editorial decisions that transform extracted data into content that connects.

The 1-week implementation

Monday: Agent runs competitive audit (prompt below) + pulls platform algorithm updates. Human reviews output, flags 2-3 angles worth pursuing.

Tuesday: Agent generates 5 hook variations per angle (hook prompt from Section 2). Human selects hooks, assigns voice, adds cultural context.

Wednesday: Agent drafts platform-adapted posts (converter prompt from Section 2). Human rewrites for voice, adds the 30% the agent can't: friction, timing, specificity.

Thursday: Publish. Human spends 30-60 minutes replying to comments (the single most underused growth tactic across every platform).

Friday: Agent runs performance review (analysis prompt below). Human reads the scores, forms one hypothesis for next week ("test vulnerability hooks on LinkedIn"), and logs it.

This prompt runs a weekly content performance analysis against the playbook data:

Analyze my social media performance this week against these
platform-specific benchmarks.

MY POSTS THIS WEEK:
[PASTE YOUR POSTS WITH THEIR METRICS: impressions, likes,
comments, shares, saves, link clicks]

BENCHMARK DATA:
- Twitter: Reply chains worth 75x a like. Bookmarks 10x.
  Retweets 20x. Top tweets are 71-100 characters.
- LinkedIn: Dwell time is #1 signal. 61+ seconds = 15.6%
  engagement rate. Carousels earn 6.60% engagement (278% more
  than video). Expert comments worth 5-7x regular comments.
- Instagram: DM shares worth 3-5x a like. Saves 1.5x.
  Mixed carousels lead at 2.33% engagement.
- Facebook: Comment threads are #1 signal. Link posts get 0.03%
  engagement (lowest). Groups get 20%+ engagement vs Pages.

FOR EACH POST:
1. Which engagement signals were strongest vs. weakest?
2. Did the hook match one of the 5 proven types (stat lead,
   contrarian claim, story hook, curiosity gap, vulnerability)?
3. Was the reveal placed at 60-70% through the post?
4. Was the CTA specific enough (not generic "thoughts?")?
5. Were external links kept out of the post body?

THEN:
- Rank my posts from strongest to weakest performing
- Identify the #1 structural change that would have improved
  my weakest post
- Suggest which hook type to try next week based on what
  worked and what didn't

And a prompt for running competitive social audits:

Audit this competitor's last 10 social media posts on [PLATFORM]
against viral content patterns.

COMPETITOR POSTS:
[PASTE 10 RECENT POSTS WITH THEIR VISIBLE ENGAGEMENT METRICS]

ANALYZE EACH POST FOR:

1. HOOK TYPE: Which of the 5 types did they use?
   (Stat lead / Contrarian claim / Story hook / Curiosity gap /
   Vulnerability confession / None of the above)

2. STRUCTURAL COMPLIANCE:
   - Line breaks every 1-3 sentences? (Y/N)
   - Sentence length variation? (Y/N)
   - Reveal at 60-70%? (Y/N)
   - Post length within platform optimal range? (Y/N)
   - External links in body vs. comment? (body/comment/none)

3. CTA QUALITY:
   - Specific question vs. generic prompt
   - Any engagement bait language?

4. SIGNAL OPTIMIZATION:
   - What engagement signal does this post optimize for?
   - Is it optimizing for the platform's highest-weighted signal?

SUMMARY:
- Which hook type does this competitor default to?
- What structural pattern do they consistently get right?
- What is their biggest structural weakness?
- What specific tactic are they using that I should test?
- What are they ignoring that represents an opportunity?

Last one. This is the prompt that pulls everything together into an agent-assisted content workflow:

Design a weekly content workflow that splits tasks between
AI agents and human judgment based on these principles:

AGENT TASKS (assign to AI):
- Research: competitive analysis, trend extraction, platform
  algorithm updates, audience data analysis
- Pattern extraction: analyze top-performing content for
  structural patterns (hook types, formatting, CTA placement)
- First drafts: generate initial versions of posts using
  specific structural templates and platform constraints
- Platform adaptation: convert one piece of content into
  formats optimized for each platform's algorithm
- Performance analysis: score published posts against
  engagement signal hierarchies

HUMAN TASKS (keep for yourself):
- Voice decisions: which tone, which cultural references,
  which personal experiences to include
- Creative judgment: which insight is genuinely surprising
  vs. merely novel for your specific audience
- Timing: what to post this week vs. next week based on
  current cultural moment
- Rule-breaking: when to violate structural patterns for
  effect (these moments create the most memorable content)
- Quality control: reviewing agent output for accuracy,
  relevance, and alignment with your brand

MY CONTENT GOALS:
[Describe your platforms, posting frequency, audience,
and current bottlenecks]

Design a Monday-through-Friday workflow that:
1. Uses agents for research and first drafts early in the week
2. Reserves human creative time for Wednesday-Thursday
3. Schedules posts aligned with platform peak times
4. Includes a Friday performance review using agent analysis
5. Keeps total human time under [X] hours per week

One last tool. Before you hit publish on any post, run through this:

Pre-publish checklist

[ ] Hook type identified (stat lead / contrarian / story / curiosity gap / vulnerability)
[ ] Hook fits under platform fold limit (Twitter: 250, LinkedIn: 200, Instagram: 125 chars)
[ ] Targeting the platform's #1 engagement signal, not likes
[ ] Line breaks every 1-3 sentences
[ ] Sentence length varies (short punches mixed with longer context)
[ ] Reveal placed at 60-70% through the post
[ ] CTA is specific and answerable, not "thoughts?"
[ ] No engagement bait language
[ ] External links removed from post body (placed in reply/comment)
[ ] 30-60 minutes blocked for post-publish comment replies

The agents that studied virality could not create it. But the human who directed them could use what they found.

That is the model. Not agents replacing humans. Not humans ignoring agents. A division of labor where extraction and execution run on different tracks, and the value sits in knowing which track each task belongs on. The companies that figure out this split fastest will not just produce more content. They will produce the right content, at the right moment, on the right platform, using the right signal.

The agents can give you the playbook. You still have to run the play.

What to watch

SkillsBench adoption: Watch whether major AI labs integrate the benchmark into their agent evaluation suites. If curated Skills become a standard part of agent deployment, expect a new market for "agent instruction design" as a professional service.
Moltbook's trajectory: The platform passed 1.6 million agent accounts in early February. Track whether human registrations grow or the security concerns choke adoption. Either outcome tells you something about the appetite for unstructured agent content.
SaaS stock recovery: The $285 billion wipeout from the Cowork plugin launch needs follow-through. If software stocks remain depressed through Q1, the market is pricing in real displacement. If they bounce back, the sell-off was just nerves.
Agent team benchmarks: Nobody, not Anthropic, not OpenAI, has published an independent evaluation proving multi-agent tools outperform a single developer working alone. The first credible benchmark to test that claim will land like a grenade.
Content team hiring patterns: Scan job postings for titles that mention "agent," "AI workflow," or "prompt engineering" next to traditional content roles. The middle-manager shift will show up in hiring before it shows up in earnings calls.

Key Takeaways

Extraction beats generation: AI agents are measurably excellent at research, pattern analysis, and systematization, and measurably unreliable at producing the creative judgment those patterns require without human direction.
Curated beats self-generated: Human-written procedural knowledge raises agent performance by 16.2 percentage points on average; agent-written procedural knowledge provides zero benefit, per the SkillsBench benchmark of 7,308 trajectories.
Focused beats comprehensive: Tight instructions with 2-3 modules outperform exhaustive documentation, and smaller models with good instructions match larger models without them.
Likes are the wrong metric: Reply chains (75x on Twitter), dwell time (13x on LinkedIn), DM shares (3-5x on Instagram), and comment threads (top signal on Facebook) all outweigh likes by orders of magnitude.
Freedom produces noise: Moltbook's 1.6 million agents produced derivative Reddit-speak and invented religions, not original content, because unconstrained agents regress to training-data averages.
The new job is supervision: The $285 billion stock wipeout and simultaneous launches of Anthropic agent teams and OpenAI Frontier signal that "AI middle management," directing, reviewing, and connecting agent output, is the emerging core competency.
The playbook is real: Five universal hook types, platform-specific algorithm weights, and formatting rules extracted from 20,000+ high-performing posts are immediately usable with the copy-paste prompts in this article.

Sources & Further Reading

Primary Sources

Security concerns and skepticism are bursting the bubble of Moltbook — AP News, Kaitlyn Huamani, February 6, 2026
AI companies want you to stop chatting with bots and start managing them — Ars Technica, Benj Edwards, February 5, 2026
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks — arXiv, Xiangyi Li et al., February 13, 2026

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai

We Sent 4 AI Agents to Study Virality. Here's What They Found, and Where They Failed.