7 benchmarked techniques for Claude API & Claude Code. From prompt caching to model routing — provably cheaper, measurably faster.
Add cache_control to your system prompt. Cache reads cost 0.1× vs a fresh read. At 87.3% hit rate, savings compound with every request.
Route simple tasks to Haiku ($0.80/M input) instead of Sonnet ($3.00/M). Classify task complexity first; send only hard tasks upstream. 3.75× cheaper on routed calls.
Without caching, multi-turn conversations grow O(n²) — each message re-sends the full history. Cache the conversation checkpoint and break the growth curve.
Output tokens cost 5× more than input. Set tight max_tokens per task type. A classification task doesn't need 2,000 tokens of headroom.
Strip team info, boilerplate, and FAQs. Keep only rules Claude acts on. Every token in CLAUDE.md gets re-sent on every request — dead weight is pure cost.
Exclude node_modules, build artifacts, media files, and lock files. Fewer files in context = faster responses and lower token cost per session.
Prompt caching + model routing + output budgeting together. Savings stack multiplicatively — each layer cuts what's left.
cache_control to system prompt → 71% savings on repeat requests, immediate.All benchmark scripts, audit tools, and before/after examples are in the open-source repo.