Skip to content

Cost at Scale

Budgets, benchmarks, and optimization for production workloads

9 min read

One developer’s CLI usage costs $6/day. Totally fine. Multiply by 50 developers running unattended agents overnight, and your monthly API bill becomes a line item that needs a budget owner and a hard cap.

When a single developer uses Claude CLI, costs are intuitive — you see the bill and adjust. When ten developers run thousands of API calls per day, costs become a system design problem. This chapter covers the budgeting, model selection, and optimization strategies that keep production workloads predictable and affordable.

See What It Costs for per-call pricing and cache economics.

Startup Token Overhead

Every claude session consumes 30,000-40,000 tokens before you type a single word. This “startup tax” includes:

  • System prompt: ~14,000 tokens (tool definitions, safety rules, output formatting)
  • CLAUDE.md files: 1,000-10,000+ tokens (all files in the hierarchy are loaded)
  • MCP tool descriptions: 0-50,000+ tokens (each tool = 200-500 tokens; 20 servers can add 20K+)

At Opus pricing, this means $0.02-0.06 per session just to start. For Pro plan users, startup alone burns 1-3% of your daily quota. This cost is invisible — it does not appear in the result field, only in the usage.cache_read_input_tokens and usage.cache_creation_input_tokens breakdown.

Mitigation strategies:

  • Disable unused MCP servers — each removed server saves 200-500 tokens per call
  • Use --tools "" for pure text tasks — strips tool definitions from the system prompt
  • Keep CLAUDE.md under 10,000 words — split monorepo configs into subdirectory files
  • Batch related calls to maximize cache hits on the startup overhead
Startup Burns 1-3% of Pro Quota

Community reports confirm that a single session startup on Pro plan consumes 1-3% of your daily quota before you ask anything. With 30-40K tokens loaded (system prompt + tools + CLAUDE.md), the invisible startup cost is roughly equivalent to a short conversation. Disable unused MCP servers and keep CLAUDE.md lean to minimize this overhead.

Team Cost Benchmarks

Before setting organizational budgets, you need a baseline. These figures come from real-world Claude Code usage across development teams.

Team Cost Benchmarks

MetricValuePlanning Note
Average per developer/day$6Use this for baseline budgeting
90th percentile daily$12Set alerts at this threshold
Monthly estimate per developer$100-20022 working days at $5-9/day
10-person team monthly$1,000-2,000Budget for $2,500 to cover spikes
50-person team monthly$5,000-10,000Negotiate volume pricing at this tier

These numbers assume a mix of Opus and Sonnet usage with caching enabled. Without caching, multiply by 3-5x. A 100-turn Opus session costs $50-100 uncached versus $10-19 cached — that difference compounds fast across a team.

Tip

Track per-developer spend with —output-format json and aggregate total_cost_usd across calls. Tools like ccusage can automate this across your organization.

Try This

Measure your actual per-task cost. Run the same prompt 3 times in one session and track costs:

for i in 1 2 3; do claude -p “Summarize this repo” —output-format json | jq ‘{run: ‘$i’, cost: .total_cost_usd, cached: .usage.cache_read_input_tokens}’; done

Watch how cache_read_input_tokens increases on subsequent calls. How much cheaper is run 3 compared to run 1?

Optimization Deep Dives

This chapter covers team benchmarks and the cost baseline. For actionable optimization strategies, see the deep-dive chapters:

  • Optimization Strategies — model routing (Sonnet vs Opus), effort levels, two-pass strategies, prompt caching, batch processing, session management, and cache TTL awareness
See This in Action

See real per-review cost analytics ($0.04-0.08 avg) with SQL dashboards tracking cost_usd, turns, and cache_read_tokens across 500+ reviews in Build an MR Reviewer, Part 6: Handle Errors and Track Costs.

Now Do This

Add cost tracking to one automation script: COST=$(claude -p ”…” —output-format json | jq ‘.total_cost_usd’) and log it to a file. After a week, you’ll know your actual per-task costs — the first step to budgeting at scale.

Budget Controls

A developer’s overnight agent runs 200 API calls while they sleep. There’s no org-wide budget cap, so the bill hits $400 before anyone notices. Budget controls aren’t about individual calls — they’re about organizational policy enforcement at scale.

When a single developer uses Claude CLI, costs are intuitive — you see the bill and adjust. When ten developers run thousands of API calls per day, costs become a system design problem. This section covers the budgeting controls and organizational policies that keep production workloads predictable.

Per-Call Budgets

The --max-budget-usd flag sets a spending limit on a single invocation:

Terminal window
# Cap a code review at $2
claude -p "Review this PR for issues" --output-format json --max-budget-usd 2.00
# Cap a quick lookup at $0.10
claude -p "What does this function do?" --output-format json --max-budget-usd 0.10

In scripts that run unattended, always set a budget. Without one, a multi-turn agent loop can burn through dollars before anyone notices.

Gotcha

The budget is checked between turns, not mid-generation. A single turn can exceed the budget by 100x or more. Set budgets at least 10x your expected per-turn cost. A $0.10 minimum is practical for Opus; anything below $0.02 will always trigger a budget error from the system prompt alone.

Budget Overshoot Reality

Experimentally measured overshoot ratios for Opus (2026-03-19):

Budget Overshoot by Level

BudgetActual CostOvershoot RatioVerdict
$0.001$0.039 - $0.04838 - 48xMinimum API floor dominates
$0.01$0.0171.7xReasonable at realistic budgets
$0.10+Near budget~1xBudget controls work as expected

The minimum API call floor for Opus is ~$0.02-0.05 (system prompt + one turn). Any budget below this floor guarantees a massive overshoot ratio because the first turn always completes regardless. For practical budget control, use $0.05+ for single-turn tasks and $0.10+ for multi-turn.

Try This

Test budget enforcement on a single call:

claude -p “Write a 1000-word essay about testing” —max-budget-usd 0.05 —output-format json | jq ‘{subtype, cost: .total_cost_usd}’

Did the call complete or hit the budget limit? Remember: the budget is checked between turns, not mid-generation. Try with —max-budget-usd 0.001 — what happens?

Per-Session Budgets

When using --resume to continue a conversation, the budget tracks cumulative cost across all calls in that session:

Terminal window
# First call — uses part of the $5 budget
claude -p "Analyze the codebase" --output-format json --max-budget-usd 5.00
# Resumed call — shares the same $5 budget
claude -p "Now refactor the auth module" --resume $SESSION_ID --max-budget-usd 5.00

A $5 budget shared across 5 resumed calls averages about $1 per call. Plan your per-session budget around the total work you expect, not individual turns.

Organizational Limits

For teams, per-call budgets are not enough. You need a wrapper that enforces organizational policy:

#!/bin/bash
# team-claude.sh — wrapper that enforces org-wide daily limits
DAILY_LIMIT=15.00
DAILY_LOG="/tmp/claude-cost-$(whoami)-$(date +%Y%m%d).log"
# Sum today's spend
SPENT=$(awk '{sum += $1} END {printf "%.4f", sum}' "$DAILY_LOG" 2>/dev/null || echo "0")
REMAINING=$(echo "$DAILY_LIMIT - $SPENT" | bc)
if (( $(echo "$REMAINING <= 0" | bc -l) )); then
echo "Daily budget exhausted ($DAILY_LIMIT). Resets tomorrow." >&2
exit 1
fi
# Run with remaining budget as cap
RESULT=$(claude -p "$@" --output-format json --max-budget-usd "$REMAINING")
COST=$(echo "$RESULT" | jq -r '.total_cost_usd // 0')
echo "$COST" >> "$DAILY_LOG"
echo "$RESULT"

This pattern gives you a daily per-developer cap while still letting individual calls use --max-budget-usd for finer control underneath.

Cost Monitoring

Aggregating Costs Across Calls

Every JSON response includes total_cost_usd. Aggregate this across pipeline steps for real-time cost tracking:

#!/bin/bash
# ci-cost-tracker.sh — aggregate costs across a CI pipeline
COST_LOG="/tmp/ci-costs-$BUILD_ID.log"
run_claude_tracked() {
local label="$1"; shift
RESULT=$(claude -p "$@" --output-format json)
COST=$(echo "$RESULT" | jq -r '.total_cost_usd // 0')
echo "$label: $COST" >> "$COST_LOG"
echo "$RESULT"
}
# Run multiple steps
run_claude_tracked "lint-review" "Check for lint violations" --max-budget-usd 0.50
run_claude_tracked "security-review" "Review for security issues" --max-budget-usd 1.00
run_claude_tracked "doc-gen" "Generate API docs" --max-budget-usd 0.50
# Report total
TOTAL=$(awk -F': ' '{sum += $2} END {printf "%.4f", sum}' "$COST_LOG")
echo "Total pipeline cost: \$$TOTAL"

Budget Alert Thresholds

Set up tiered alerts based on the team cost benchmarks:

Budget Alert Thresholds

ThresholdActionWho Gets Notified
Developer exceeds $12/daySlack notificationDeveloper + team lead
Team exceeds 80% of monthly budgetWarning emailTeam lead + finance
CI pipeline exceeds $5/runPipeline annotationPR author
Single call exceeds $2Audit log entryAutomated review

Budget controls limit spending. To also reduce spending, see Optimization Strategies for routing tasks to cheaper models and maximizing cache hits.

Now Do This

Set —max-budget-usd 0.50 on your next automation run. Check the subtype field in the response — did it complete as “success” or hit “error_max_budget_usd”? This is the simplest budget control: a per-call cap that prevents runaway spending.