How Caveman Saves Your Claude Code Tokens
After Anthropic's recent rate limits on Claude Code, every token counts. The caveman skill has been sitting in the plugin marketplace with 50k stars, quietly saving developers 65-75% on output tokens. Time to explain how it actually works.
What is Caveman?
Caveman is a Claude Code skill/plugin that makes the AI communicate like a caveman. Terse, direct, zero fluff. Same technical accuracy. 75% fewer words.
Install:
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
Then just say /caveman to activate.
How It Works
The skill instructs Claude to drop:
- Articles ("the", "a", "an")
- Filler phrases ("Sure!", "Of course!", "I'd be happy to")
- Hedging ("might", "could be", "possibly")
- Pleasantries and throat-clearing
What stays:
- Technical substance
- Code and commands
- Actionable answers
Example
Normal Claude (69 tokens):
"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I'd recommend using useMemo to memoize the object."
Caveman Claude (19 tokens):
"New object ref each render. Inline object prop = new ref = re-render. Wrap in
useMemo."
Same fix. 75% less word. Brain still big.
The Numbers
From their benchmarks:
| Task | Normal | Caveman | Saved |
|---|---|---|---|
| Explain React re-render bug | 1180 | 159 | 87% |
| Fix auth middleware token expiry | 704 | 121 | 83% |
| Set up PostgreSQL connection pool | 2347 | 380 | 84% |
| Debug PostgreSQL race condition | 1200 | 232 | 81% |
| Docker multi-stage build | 1042 | 290 | 72% |
| Average | 1214 | 294 | 65% |
Range: 22-87% savings. That's real money when you're paying per token.
Why This Helps With Rate Limits
Claude Code's limits hit output tokens hard. When you're in a long coding session, every response adds up. Caveman doesn't just save tokens. It saves you time reading responses.
Also interesting: a March 2026 paper found that constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks. Verbose isn't always better. Sometimes less word = more correct.
Intensity Levels
Pick your grunt level:
/caveman lite # Drop filler, keep grammar. Professional but no fluff
/caveman full # Default. Drop articles, fragments, full grunt
/caveman ultra # Maximum compression. Abbreviate everything
/caveman wenyan # 文言文 mode. Classical Chinese compression.
Bonus: Input Compression
Caveman also includes a compress tool for your memory files:
/caveman:compress CLAUDE.md
This rewrites your CLAUDE.md into caveman-speak so Claude reads fewer tokens every session start. Human-readable version stays in CLAUDE.original.md.
| File | Original | Compressed | Saved |
|---|---|---|---|
| claude-md-preferences.md | 706 | 285 | 59.6% |
| project-notes.md | 1145 | 535 | 53.3% |
| Average | 898 | 481 | 46% |
Other Commands
/caveman-commit: Terse commit messages. 50 char subject./caveman-review: One-line PR comments:L42: bug: user null. Add guard.
Ecosystem
Caveman is one part of a three-tool system:
- caveman: Output compression
- cavemem: Cross-agent persistent memory
- cavekit: Spec-driven autonomous build loop
They compose: cavekit orchestrates, caveman compresses what the agent says, cavemem compresses what the agent remembers.
Bottom Line
If you're hitting Claude Code rate limits, caveman is a free, one-line install that cuts your token usage by 65-75% while keeping all the technical accuracy. The AI still solves your problems. It just stops writing essays about it.
Worth trying if you're paying for Claude Code Pro or just want faster responses.
Get it: github.com/JuliusBrussee/caveman