2026-05-02 · 5 min read

Two Models, One Rule: How Solo Developers Cut AI Costs Without Cutting Quality

Sonnet for new decisions. Haiku for established implementation. Hydrate is the bridge. Here is the benchmark data behind the one rule that changes your AI spend without changing your output.

The bill that surprises people

You start a new project. Everything needs to be figured out: architecture, conventions, data model. You lean on Claude Code. It is worth it; you are making real decisions and the thinking assistance is valuable.

Then you spend six weeks implementing what you designed. Same model, same rate, same cost per session. Except now you are not making decisions. You are executing them. You are writing the third endpoint in a pattern you established on day two. You are debugging against conventions you documented in sprint one. Claude does not need to reason through your architecture with you. It needs to remember it.

The question is: why are you paying frontier model prices for implementation work on an established codebase?

What the benchmarks actually show

I have been running a benchmark programme for Hydrate across twelve runs and five different scenario types: new project sprints, multi-sprint team simulations, new developer onboarding, and P0 incident response.

The finding I did not anticipate when I started: memory does not primarily make agents smarter. It makes cheaper models viable.

The headline result from the firefighter scenario says it plainly. An on-call engineer, Frank, gets paged at 2am. He has never seen the codebase. A security bypass was committed to production. He has ten minutes to understand the architecture, find the bug, fix it, and tag a release.

I ran this twice. Same task, same quality grader. Both scored 8/10. Both fixed the bug correctly. Both builds passed.

Frank with Hydrate cost $0.075. Frank without Hydrate cost $0.138. That is 1.84x more expensive for identical output. The difference: Frank with Hydrate ran on Haiku. Frank without Hydrate needed Sonnet.

The onboarding result was similar. A new developer joining a three-sprint project, implementing a feature from scratch. With Hydrate on Haiku: $0.162. Without Hydrate on Sonnet: $0.510. Same quality score. 3.15x cheaper.

The multi-sprint result goes further. Over three sprints, Haiku with Hydrate cost 24% less than all-Sonnet without Hydrate, and scored higher on code quality. The no-Hydrate codebase had accumulated dead handlers and duplicated utility functions that the automated grader traced directly to cold-start amnesia: each sprint had re-derived shared logic rather than extending a common foundation.

The mechanism: it is all cache

Anthropic's prompt cache charges $0.30 per million tokens for Sonnet reads versus $3.00 per million for fresh input, a 10x difference. For Haiku it is $0.08 versus $0.80. Same ratio.

When Hydrate's hook fires before your prompt turn, it injects prior session context. That context is stable; it does not change from turn to turn. So Anthropic's cache serves it at the cheap rate on every subsequent turn in the session.

Without Hydrate, each session starts cold. Everything arrives fresh. You pay the full rate while the cache warms.

In practice: a session without Hydrate runs at roughly 67% cache hit rate at the start. A session with Hydrate runs at 97–98%, consistently, across every run in the benchmark programme, from Sprint 1 through Sprint 3 and across new developers who had never seen the codebase.

That 67% to 97% gap is the entire cost story in a single number. A warm cache makes Haiku behave more like Sonnet, because a significant fraction of what Sonnet has over Haiku in practice is context: knowing what decisions were made, what patterns are established, what the codebase looks like. Inject that context reliably and the gap closes.

The single developer strategy

For a team, this plays out as role allocation: use Sonnet for the lead architect who makes decisions, Haiku for the implementers who execute them. Hydrate carries the architect's decisions to the implementers as warm, cached context.

For a solo developer, you are both. You make decisions and you implement them, often in the same day, sometimes in the same session.

The strategy simplifies to one rule:

Sonnet for new decisions. Haiku for established implementation.

Hydrate is the bridge that makes this work. Your Sonnet session, where you designed the data model, chose the auth pattern, figured out the tricky edge case, gets captured and injected before every subsequent prompt. By your second session on any project, you are already at 97%+ cache hit rate. That is what makes Haiku viable: it is not reasoning through cold context, it is executing against warm context.

When to use Sonnet:

First session on a new project or feature
Any "how should I approach this?" architectural question
When you are genuinely uncertain about the right direction

When to use Haiku:

Any session implementing what you have already decided
Debugging within established code — the benchmark Frank went straight to the right file because team context was injected; no exploration overhead
Picking up work from a previous session: Hydrate has the context, Haiku does not need to rediscover it

The signal is simple. If your opening message is "implement X using our established pattern" or "continue from yesterday," that is a Haiku session. If it is "I need to figure out how to..." that is a Sonnet session.

What this looks like in practice

Most solo development work, most of the time, is implementation on an established codebase. You designed your architecture once. You chose your patterns once. Now you are building against those decisions, sprint after sprint, feature after feature.

Without Hydrate, every session starts cold regardless. Claude re-reads your codebase, re-discovers your patterns, re-derives your conventions, and you pay fresh-input rates for that rediscovery on every session.

With Hydrate, that context is warm before the first turn. You pay cache rates for context that is already established. And warm context is what Haiku needs to close the quality gap.

The realistic allocation for a solo developer after the first sprint: roughly 80% of sessions are Haiku, 20% are Sonnet. The 20% is when you are making new decisions. The 80% is everything else.

The cost difference at that split: significant. The Haiku token rates are roughly one-quarter of Sonnet's. The quality difference on warm-context implementation tasks: negligible, as the benchmarks show consistently across debugging, feature implementation, and onboarding.

The thing worth understanding

The intuitive argument for a memory layer is that it prevents mistakes: that with more context, the agent makes fewer errors. That is true at the structural level over long projects, where cold-start amnesia accumulates into dead code and duplicated logic.

But the primary value, the one that shows up consistently across every scenario type in the benchmark programme, is different. It is not that memory makes the expensive model perform better. It is that memory makes the cheap model sufficient.

You were always going to use Sonnet for the session where you designed your authentication system. That session warranted the cost. What Hydrate changes is every session after it: the dozens of implementation sessions where you are building against that decision, not making it. Those sessions do not need frontier reasoning. They need frontier context. And context, unlike reasoning, is something you can carry forward cheaply.

That is the practical shape of it: you invest in Sonnet when you are deciding, and Hydrate turns that investment into context that Haiku can execute against. You stop paying twice for the same thinking.

The benchmark data behind this article: 12 runs across apj-v4 through apj-v13, containerised, graded by automated Sonnet session. Full results in the benchmark report.