How Caveman Saves Your Claude Code Tokens

After Anthropic's recent rate limits on Claude Code, every token counts. The caveman skill has been sitting in the plugin marketplace with 50k stars, quietly saving developers 65-75% on output tokens. Time to explain how it actually works.

What is Caveman?

Caveman is a Claude Code skill/plugin that makes the AI communicate like a caveman. Terse, direct, zero fluff. Same technical accuracy. 75% fewer words.

Install:

claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman

Then just say /caveman to activate.

How It Works

The skill instructs Claude to drop:

Articles ("the", "a", "an")
Filler phrases ("Sure!", "Of course!", "I'd be happy to")
Hedging ("might", "could be", "possibly")
Pleasantries and throat-clearing

What stays:

Technical substance
Code and commands
Actionable answers

Example

Normal Claude (69 tokens):

"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I'd recommend using useMemo to memoize the object."

Caveman Claude (19 tokens):

"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Same fix. 75% less word. Brain still big.

The Numbers

From their benchmarks:

Task	Normal	Caveman	Saved
Explain React re-render bug	1180	159	87%
Fix auth middleware token expiry	704	121	83%
Set up PostgreSQL connection pool	2347	380	84%
Debug PostgreSQL race condition	1200	232	81%
Docker multi-stage build	1042	290	72%
Average	1214	294	65%

Range: 22-87% savings. That's real money when you're paying per token.

Why This Helps With Rate Limits

Claude Code's limits hit output tokens hard. When you're in a long coding session, every response adds up. Caveman doesn't just save tokens. It saves you time reading responses.

Also interesting: a March 2026 paper found that constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks. Verbose isn't always better. Sometimes less word = more correct.

Intensity Levels

Pick your grunt level:

/caveman lite    # Drop filler, keep grammar. Professional but no fluff
/caveman full   # Default. Drop articles, fragments, full grunt
/caveman ultra  # Maximum compression. Abbreviate everything
/caveman wenyan # 文言文 mode. Classical Chinese compression.

Bonus: Input Compression

Caveman also includes a compress tool for your memory files:

/caveman:compress CLAUDE.md

This rewrites your CLAUDE.md into caveman-speak so Claude reads fewer tokens every session start. Human-readable version stays in CLAUDE.original.md.

File	Original	Compressed	Saved
claude-md-preferences.md	706	285	59.6%
project-notes.md	1145	535	53.3%
Average	898	481	46%

Other Commands

/caveman-commit: Terse commit messages. 50 char subject.
/caveman-review: One-line PR comments: L42: bug: user null. Add guard.

Ecosystem

Caveman is one part of a three-tool system:

caveman: Output compression
cavemem: Cross-agent persistent memory
cavekit: Spec-driven autonomous build loop

They compose: cavekit orchestrates, caveman compresses what the agent says, cavemem compresses what the agent remembers.

Bottom Line

If you're hitting Claude Code rate limits, caveman is a free, one-line install that cuts your token usage by 65-75% while keeping all the technical accuracy. The AI still solves your problems. It just stops writing essays about it.

Worth trying if you're paying for Claude Code Pro or just want faster responses.

Get it: github.com/JuliusBrussee/caveman