The 96% cache hit rate is the number your CFO needs.
Twelve benchmarks across five scenarios. The finding that kept repeating: Hydrate's structured context injection hits 96-98% prompt cache rates. Anthropic charges 10x less for cached tokens than fresh ones. That gap is not a footnote in the ROI calculation. It is the ROI calculation.
Why cache hit rate matters more than anything else
Anthropic's prompt cache charges differently based on whether a token arrives fresh or is served from cache. For Sonnet 4.6: $3.00 per million fresh input tokens, $0.30 per million cache-read tokens. For Haiku 4.5: $0.80 per million fresh, $0.08 per million cached.
That's a 10x price difference, at both model tiers.
Hydrate injects the same structured context block before every prompt turn. Because it's structured and stable, Anthropic's cache recognises it and serves subsequent turns at the cache-read rate. Without Hydrate, each new session starts cold. Everything arrives fresh until the cache warms up, which takes several turns and resets completely at session boundaries.
Run 6 · Dick P1 · cold start
Measured across all 8 Hydrate-enabled runs
$3.00/M fresh vs $0.30/M cached (Sonnet)
The 67% figure isn't a worst case. It's a typical cold-start first session on a real project. Dick had worked on the same codebase in a previous sprint but started fresh, as every new session does without memory. The 30-point gap between 67% and 97% represents tokens shifting from the $3.00/M tier to the $0.30/M tier. At volume, this is the dominant cost driver.
Three configurations, three price points, measured quality for each
Enterprise buying conversations usually need a "good, better, best" structure. The benchmark data gives you three named configurations with specific measured numbers.
All-Sonnet, no Hydrate
Your existing setup. No Hydrate installed. Agents share context only via committed documentation. Cold-start cache hit rates average around 70% at scale.
All-Sonnet + Hydrate
Install Hydrate, keep Sonnet everywhere. Same model costs, but 96% cache hit rates immediately. Agents share memory across sessions and team members.
Model-switched + Hydrate
Lead engineers on Sonnet. Implementation sessions on Haiku with Hydrate's warm cache. The combination Hydrate was designed for.
* Annual projections based on 1,000 developers · 4 sessions/day · 220 working days = 880,000 sessions/year. Sonnet: $3.00/M input, $0.30/M cache read, $15.00/M output. Haiku: $0.80/M input, $0.08/M cache read, $4.00/M output. Quality scores from automated grader sessions in each benchmark run.
Onboarding: equal quality, 3.15x cheaper. Full stop.
We ran a new developer, Eve, joining a finished three-sprint project. Same task: implement a task comments feature that fits the existing conventions. No prior knowledge of the codebase.
hydrate_team_pull unprompted. Received accumulated team architectural
decisions. Implemented the feature.
Both scored 7/10. Both correctly applied JWT authentication. Both missed the pagination envelope on the list endpoint. Both left out task-existence validation. The violations were identical.
This matters because it makes the onboarding claim auditable. It's not "Hydrate prevented mistakes". The sprint docs prevented most mistakes, and Hydrate didn't add anything there. What Hydrate did was make Haiku viable. $0.162 vs $0.510, for the same 7/10 output.
At a team of 1,000 developers with 20% annual turnover, onboarding costs alone represent a material saving. 200 new engineers per year, each running multiple onboarding sessions at 3.15x the cost differential. The model substitution compounds on every single first-week session.
The compounding effect kicks in at Sprint 2
The most important finding in the multi-sprint simulation: Hydrate's cache doesn't just maintain a constant saving. It compounds. Each sprint adds more context to an already-warm cache, reducing costs faster than the no-Hydrate baseline can match.
| Sprint | Haiku + Hydrate | All-Sonnet, no Hydrate | Saving | Drop from Sprint 1 |
|---|---|---|---|---|
| Sprint 1 | $0.789 | $1.040 | −24% | baseline |
| Sprint 2 | $0.379 | $0.586 | −35% | Hydrate: −52% · No-Hydrate: −44% |
| Sprint 3 | $0.529 | $0.652 | −19% | Hydrate: −33% · No-Hydrate: −37% |
| Total | $1.93 | $2.54 | −24% |
Sprint 2 is the inflection point. The Hydrate team dropped 52% from Sprint 1; the no-Hydrate team dropped 44%. Three sprints of decisions are already warm in the cache. Each new sprint adds to what's already there rather than rebuilding from committed docs.
The quality finding compounds too. The no-Hydrate team's code quality scored 6/10 by Sprint 3 vs 8/10 for the Hydrate team, not because features were missing but because of structural amnesia: dead handlers still compiled into the binary, duplicated utility functions, no-op stubs. The grader attributed all of this directly to "no persistent memory of prior sprint decisions".
Incident response: 1.84x cheaper for an identical fix
The firefighter scenario isolates a single variable: does Hydrate change the cost of cold-start discovery on a codebase you've never seen? Answer: yes, by 1.84x.
hydrate_team_pull. Architectural context injected.
Went straight to main.go protected() closure.
One commit. Correct fix.
internal/handlers/ (dead code).
Deleted dead handlers. Found bug in internal/projects/handler.go.
Two commits. Correct fix.
Same quality. Same security outcome. The difference is purely the cost of discovery. Sprint docs describe what your authentication should do. They don't say where it's wired. That gap, between "what" and "where", is the cold-start tax. Hydrate eliminates it.
Zero adoption friction
Every agent in every Hydrate-enabled benchmark run opened with
team_pull → recall → save_fact. Without exception. Haiku agents.
Sonnet agents. Lead engineers. Implementers. New joiners. On-call engineers who'd never
seen the codebase. Six consecutive Haiku sessions across three sprints without a single
deviation.
None of them were instructed to do this. Claude Code discovers the MCP tools automatically and the agents choose to use them immediately. The behavioural overhead of rolling out Hydrate across your engineering organisation is zero. You install it; the agents adapt.
What this costs and what it saves
Assumptions: 1,000 developers · 4 sessions/day · 220 working days = 880,000 sessions/year. Published Anthropic pricing as of April 2026. Quality equivalence confirmed across multi-sprint, onboarding, and firefighter scenarios.
Talk to us
These benchmarks are reproducible. The Docker containers and benchmark scripts are open for inspection. If you want to run them against your own codebase and team size, we can help you set it up.