Caveman Skill Cuts Claude Code Output 20%, Not 65%

A GitHub skill called caveman promises 65 to 75 percent token reduction in Claude Code, and a viral LinkedIn post turned it into April's must-install. The repo delivers exactly what it claims against its own benchmark. The problem is the benchmark measures the wrong denominator, and a Reddit user who ran caveman for a week on a $100 subscription saved, in their own words, "roughly $1, maybe $1.50 if I squint."

The skill itself is trivial. A SKILL.md file instructs Claude to rewrite output in telegraphic English. Drop articles and hedges. Cut the transitions. Six intensity levels from lite to ultra, plus wenyan variants for Classical Chinese compression. A SessionStart hook re-injects the instruction on every turn. No model change, no tokenizer, no runtime. Install by symlink. The whole mechanism is a prompt overlay and a persona.

The Breakdown

Caveman skill reduces Claude Code output tokens 12 to 23 percent, not the 65 to 75 percent promised by viral LinkedIn posts
Output tokens make up only 0.6 to 2.5 percent of a typical Claude Code bill, capping real savings at 1 to 3 percent
Quality stays intact: three independent benchmarks show zero correctness regressions at default intensity
Real cost lever is input-side: prompt caching, tool-return discipline, and memory-file compression beat output tricks

AI-generated summary, reviewed by an editor. More on our AI guidelines.

The benchmark that started it

The repo has tightened its own framing since the LinkedIn wave. The current README now labels the headline 75 percent as output-token savings, separates out a 46 percent input-token figure for caveman-compress, and ships an evals/ note that reframes the honest comparison as caveman versus an already-terse prompt, not caveman versus a chatty assistant. That is the right comparison. The note also concedes the obvious failure mode, in its own words: a skill that replies k to everything would "win" any compression test, so fidelity has to be measured separately.

Independent runs land in the same zone. Kuba Guzik ran a three-arm version. Vanilla, a one-line "be concise" directive, and caveman. Across 72 identical prompts, caveman beat vanilla by 23 percent and beat the three-word brevity instruction by just 4 percent. Marco Pillitteri got 12 percent on coding tasks. MayhemCode logged a full Next.js session: 11,200 output tokens down to 8,900. That is 20 percent. The author has acknowledged on Hacker News that real-world savings land "closer to 20-30%". Nothing in the corrected framing changes the underlying problem. Output is the category. Output is small.

The denominator nobody quotes

Here is the math nobody on LinkedIn ran. Output tokens, the only category caveman can compress, constitute 0.6 to 2.5 percent of a typical Claude Code session's billed tokens. The rest is cached system prompt, tool definitions, file reads, and, on thinking-enabled models, invisible reasoning tokens the skill cannot influence at all. One user on r/ClaudeAI pulled their logs. 0.6 percent of total spend was visible assistant output. Everything else was input.

Stay ahead of the curve

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

Apply a 20 percent reduction to a 2 percent slice and you get 0.4 percent of your bill. Factor in the 5x output-to-input pricing ratio and it climbs to maybe 3 percent. That is the ceiling on a typical workload. The $100 subscription anecdote on Reddit is not an outlier. It is the expected outcome, and anyone who has spent a month inside Claude Code's token economics already knew it.

Where it actually earns its install

Caveman works, narrowly. Three converging benchmarks show quality stays intact at default intensity. No correctness regressions on 72 prompts. No task accuracy loss on another 30. No broken refactors across a multi-hour Next.js session. A 1 to 3 percent bill reduction at zero quality cost is a free win. Install it if you want a terse register. Accept the cost benefit as a rounding error. For a daily six-hour Claude Code user on pay-as-you-go pricing, that saving comes to about ninety cents a month.

The sub-skill tucked inside the repo is where the real money lives. caveman-compress rewrites your CLAUDE.md memory file in the same telegraphic style. Memory files get loaded on every turn and sit in the cached-input category that dominates the bill. A 25 percent reduction on a 4,000-token CLAUDE.md compounds across every session, every turn, for the life of the project. Nobody is posting about this sub-skill on LinkedIn, though that's almost beside the point.

The real lever was always input

If you care about cost, the stack rank is unambiguous. Prompt caching matters most, worth 40 to 80 percent on input once it is configured correctly. Tool-return discipline matters next: scope file reads to line ranges, stop re-running Grep on the same paths, drop the habit of dumping entire modules into context. Memory-file compression sits third. Output compression, which is what caveman does, sits last.

Microsoft's LLMLingua-2 hits 2x to 5x compression ratios on input prompts with 1 to 2 percentage points of task-performance loss. That operates on the category where 70 to 95 percent of your tokens live. Caveman operates on the 2 percent slice. The two are not competitors. They address different problems. The discourse treats them as substitutes because compression ratio sounds like compression ratio.

The real way to save tokens

Want to know how to really cut your Claude Code bill? Stop coding. Not forever. Just before you type the prompt. Think about the problem. Read the file yourself. Write the function in your head. Decide whether you need Claude at all.

The most token-efficient Claude Code session is the one you never opened. The second most efficient is the one where you already know exactly what you want before turn one. Caveman saves you a coffee a month. Thinking saves you the bill.

Frequently Asked Questions

Does the caveman skill actually reduce Claude Code costs?

Yes, but by 1 to 3 percent of a typical bill, not 65 percent. Independent benchmarks from Kuba Guzik (23 percent), Marco Pillitteri (12 percent), and MayhemCode (20 percent) measured output-token reduction, but output represents only 0.6 to 2.5 percent of total spend. A Reddit user reported saving about $1 on a $100 monthly subscription.

Why is the advertised 65 to 75 percent number so different from real-world results?

The repository's benchmark compares caveman against a baseline with no brevity instruction at all. Kuba Guzik's three-arm test showed caveman beats a simple 'be concise' directive by just 4 percent. Most of the advertised reduction comes from having any brevity instruction, not from caveman specifically. The author conceded on Hacker News that 20 to 30 percent is closer to reality.

Does caveman hurt output quality?

No, at default intensity. Three independent tests found 100 percent correctness match across 72 prompts (Guzik), unchanged task accuracy on 30 coding prompts (Pillitteri), and no regressions across a multi-hour Next.js session (MayhemCode). Ultra intensity mode and aggressive forks like caveman-ultra do show occasional instruction-following failures on complex tasks.

What is caveman-compress and why does it matter more than the headline skill?

It is a sub-skill that rewrites CLAUDE.md memory files in the same telegraphic style. Memory files load on every session turn in the cached-input category that dominates the bill. A 25 percent reduction on a 4,000-token CLAUDE.md compounds across every project session. It operates on the right denominator, unlike the main output-compression skill.

What should Claude Code users actually do to cut costs?

Configure prompt caching first, worth 40 to 80 percent on input once active. Discipline your tool returns: use line-range file reads, stop re-running Grep on the same paths. Compress memory files with caveman-compress or Microsoft's LLMLingua-2. Output-side tricks sit last because output is the smallest token category in any realistic Claude Code session.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

AI News

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai

Caveman Claude Code Skill Cuts Output 20%. Your Bill Barely Notices.

The benchmark that started it

The denominator nobody quotes

Where it actually earns its install

The real lever was always input

The real way to save tokens

Marcus Schuler

Get the Morning Briefing in your inbox.

Related Stories

China Considers Adding AI Model Weights and Chip Designs to Export List

OpenAI’s Work Agents Reach 10 Million Users, Nearly Doubling in July

Trump Officials Revive Push to Bar Chinese AI Models After Kimi K3