TOKEN OPTIMIZER
89%
Token
Savings.

7 benchmarked techniques for Claude API & Claude Code. From prompt caching to model routing — provably cheaper, measurably faster.

→ Calculate Your Savings Benchmarked on claude-sonnet-4-6
Baseline cost / request
$0.028
5,820 input + 1,140 output tokens
Best optimized cost / request
$0.003
Prompt caching + model routing + output budgeting
CLAUDE.md token reduction
91.9%
3,847 tokens → 312 tokens
Savings Calculator Interactive
Monthly API spend
$500
/ month
$10$2,500$10,000
Estimated monthly volume
~17,767 requests
Based on claude-sonnet-4-6 pricing:
$3.00/M input · $15.00/M output
Assumes 5,820 input + 1,140 output tokens/req
Projected monthly savings by technique
API Benchmark Results claude-sonnet-4-6 · May 2026
API Optimization Techniques 4 Methods
01
71.5%
cost savings
Prompt Caching

Add cache_control to your system prompt. Cache reads cost 0.1× vs a fresh read. At 87.3% hit rate, savings compound with every request.

87.3% cache hit $0.028 → $0.008
02
77.1%
cost savings
Model Routing

Route simple tasks to Haiku ($0.80/M input) instead of Sonnet ($3.00/M). Classify task complexity first; send only hard tasks upstream. 3.75× cheaper on routed calls.

Haiku vs Sonnet $0.028 → $0.006
03
63.2%
cost savings
Multi-Turn Caching

Without caching, multi-turn conversations grow O(n²) — each message re-sends the full history. Cache the conversation checkpoint and break the growth curve.

74.1% cache hit $0.028 → $0.008
04
56.8%
cost savings
Output Budgeting

Output tokens cost 5× more than input. Set tight max_tokens per task type. A classification task doesn't need 2,000 tokens of headroom.

310 vs 1,140 out $0.028 → $0.005
Code Examples — From the Repo Before & After
Claude Code Optimizations CLI · CLAUDE.md · .claudeignore
MD
CLAUDE.md Optimization
91.9%

Strip team info, boilerplate, and FAQs. Keep only rules Claude acts on. Every token in CLAUDE.md gets re-sent on every request — dead weight is pure cost.

3,847
tokens before
312
tokens after
Monthly savings: ~$6.84 at 50K requests/month
🚫
.claudeignore Setup
85.5%

Exclude node_modules, build artifacts, media files, and lock files. Fewer files in context = faster responses and lower token cost per session.

284
files before
41
files after
Context window: 124,000 → 18,000 tokens per session
Combined Result
89.3%
Saved by stacking techniques

Prompt caching + model routing + output budgeting together. Savings stack multiplicatively — each layer cuts what's left.

1
Add cache_control to system prompt → 71% savings on repeat requests, immediate.
2
Classify task complexity, route simple tasks to Haiku → 77% savings on routed volume.
3
Set tight max_tokens per task type → 57% savings on output cost alone (5× multiplier).
4
Audit CLAUDE.md + add .claudeignore → 85–92% savings in every Claude Code session.
Run the audit
Token Optimizer

All benchmark scripts, audit tools, and before/after examples are in the open-source repo.

# run token audit
python claude_code/token_audit.py

# run benchmarks
python benchmark.py

# visualize results
python visualize.py
7
Techniques
89%
Max savings
$0.003
Per request
91.9%
CLAUDE.md cut