Claude Token Optimizer — Interactive Results Dashboard

◆

TOKEN OPTIMIZER

89%
Token
Savings.

7 benchmarked techniques for Claude API & Claude Code. From prompt caching to model routing — provably cheaper, measurably faster.

→ Calculate Your Savings Benchmarked on claude-sonnet-4-6

Baseline cost / request

$0.028

5,820 input + 1,140 output tokens

Best optimized cost / request

$0.003

Prompt caching + model routing + output budgeting

CLAUDE.md token reduction

91.9%

3,847 tokens → 312 tokens

Savings Calculator Interactive

Monthly API spend

$500

/ month

$10$2,500$10,000

Estimated monthly volume

~17,767 requests

Based on claude-sonnet-4-6 pricing:
$3.00/M input · $15.00/M output
Assumes 5,820 input + 1,140 output tokens/req

Projected monthly savings by technique

API Optimization Techniques 4 Methods

71.5%

cost savings

Prompt Caching

Add cache_control to your system prompt. Cache reads cost 0.1× vs a fresh read. At 87.3% hit rate, savings compound with every request.

87.3% cache hit $0.028 → $0.008

77.1%

cost savings

Model Routing

Route simple tasks to Haiku ($0.80/M input) instead of Sonnet ($3.00/M). Classify task complexity first; send only hard tasks upstream. 3.75× cheaper on routed calls.

Haiku vs Sonnet $0.028 → $0.006

63.2%

cost savings

Multi-Turn Caching

Without caching, multi-turn conversations grow O(n²) — each message re-sends the full history. Cache the conversation checkpoint and break the growth curve.

74.1% cache hit $0.028 → $0.008

56.8%

cost savings

Output Budgeting

Output tokens cost 5× more than input. Set tight max_tokens per task type. A classification task doesn't need 2,000 tokens of headroom.

310 vs 1,140 out $0.028 → $0.005

Claude Code Optimizations CLI · CLAUDE.md · .claudeignore

CLAUDE.md Optimization

91.9%

Strip team info, boilerplate, and FAQs. Keep only rules Claude acts on. Every token in CLAUDE.md gets re-sent on every request — dead weight is pure cost.

3,847

tokens before

312

tokens after

Monthly savings: ~$6.84 at 50K requests/month

🚫

.claudeignore Setup

85.5%

Exclude node_modules, build artifacts, media files, and lock files. Fewer files in context = faster responses and lower token cost per session.

284

files before

files after

Context window: 124,000 → 18,000 tokens per session

Combined Result

89.3%

Saved by stacking techniques

Prompt caching + model routing + output budgeting together. Savings stack multiplicatively — each layer cuts what's left.

Add cache_control to system prompt → 71% savings on repeat requests, immediate.

Classify task complexity, route simple tasks to Haiku → 77% savings on routed volume.

Set tight max_tokens per task type → 57% savings on output cost alone (5× multiplier).

Audit CLAUDE.md + add .claudeignore → 85–92% savings in every Claude Code session.

Run the audit

Token Optimizer

All benchmark scripts, audit tools, and before/after examples are in the open-source repo.

      # run token audit

      python claude_code/token_audit.py

      # run benchmarks

      python benchmark.py

      # visualize results

      python visualize.py

Techniques

89%

Max savings

$0.003

Per request

91.9%

CLAUDE.md cut