BETA In open beta. Install live. Lock $5/mo for your first 12 months. See pricing →
← Blog

The expensive part isn't the memory. It's the session you never cleared.

Claude Code will now tell you what's driving your usage. When I looked at mine, the headline was not the memory tooling I'd been worrying about. It was that two-thirds of a heavy day's spend came from sessions sitting above 150k tokens of context. A long session is expensive even when it's cached. Here is the maths, and the one habit that fixes it.

The line you'll fixate on is the wrong one

The usage breakdown lists your costliest slash commands. Mine put /hydrate-distill, the command that compresses a session into memory before I clear it, at 4% of the day. My first instinct was that 4% is high for a single command, and that I should make it cheaper.

That's optimising the cure. The disease was three lines above it: most of the spend was sessions over 150k context, with a good chunk of the rest from MCP tool results that, once returned, stay resident for the remainder of the session. The distil command is what lets me get rid of all of that. Spending a few percent on the thing that sheds the other ninety-something is not a problem. It's the trade working as intended.

But it only works if you finish the move. Distil, then clear. If you distil and keep going in the same window, you've paid for the compression and you're still carrying the full context. You pay twice. So the interesting question isn't "why does distil cost 4%". It's "why does a long context cost so much in the first place, even with a warm cache".

A cached token is cheaper, not free

Anthropic's prompt cache is the reason memory is cheap, and it's also the reason people misjudge long sessions. The cache discount is real: reading a cached token costs about a tenth of a fresh one. For Opus, $0.50 per million cached against $5.00 fresh. For Sonnet, $0.30 against $3.00. Ten to one.

The trap is the word "free". It isn't. You re-read the entire resident context on every turn, at the cache-read rate. A 150k-token context on Opus is $0.075 per turn just to re-read what's already there, before the model does a single new thing. Fifty turns into a sprawling session and you've spent a few dollars re-reading the same transcript over and over. The cache made each re-read cheap. It didn't make the re-reading go away.

Two things make it worse. First, the cache has a five-minute time-to-live. Step away for a coffee, come back, and the next turn re-writes that whole context at the cache-write rate, $6.25 per million on Opus. Re-warming a 150k context is about $0.94 in one turn. Long sessions tend to have idle gaps, and every gap past five minutes is a fresh write. Second, big MCP results and subagent dumps land in the context and never leave. A single tool call that returns a hundred kilobytes is now part of every re-read for the rest of the session.

None of this is a bug. It's just what a long, stable context costs when you keep adding turns to it. The fix is not a better cache. It's a shorter context.

Clearing is only free if you don't lose the thread

Everyone knows the advice: clear the context when you switch tasks. Almost nobody does it, because clearing has a real cost that has nothing to do with tokens. You lose the thread. The decisions you made, the dead ends you already ruled out, the half-finished task and the exact thing you were about to do next. Re-establishing that by hand is slow and error-prone, so people keep the session open instead and pay the context tax rather than the re-orientation tax.

This is the whole reason a memory layer earns its place. The point of distilling before you clear is that clearing stops being lossy. The compression step writes a handover note, the load-bearing decisions into canon, and the in-flight work into goals. The next session reads that back, a few kilobytes of compressed essence, not the 150k transcript that produced it.

The asymmetry is the entire argument. Re-reading a 150k context is $0.075 a turn. Re-reading a 5k handover is $0.0025 a turn, and it carries the part you actually needed. You're not choosing between continuity and cost. With the distil step in front of the clear, you get the continuity at roughly a thirtieth of the price.

What this looks like in practice

One session per task, not one session per day. The work I did the day I looked at that usage breakdown was four genuinely separate things: a code review, a bug fix, a small feature, and a design spec. They ran in one window that grew all day. That should have been four sessions, each distilled and cleared at its boundary, each starting from a warm few-kilobyte brief instead of inheriting the accumulated weight of everything before it.

Be deliberate about the big inert payloads. The most expensive single items in a long session are often not the model's own output. They're a giant tool result or a subagent's full report, dropped into context once and re-read forever. If you don't need it any more, compact it out.

And finish the move. The habit is two steps, not one. Distil writes the memory; clear collects the saving. The command that shows up in your usage breakdown is the first half. The /clear right after it is what turns that spend into a refund.

The honest version of the finding

I build a memory layer, so the temptation is to say memory is what makes you cheap. That's not quite it. The cache is what makes memory cheap. Memory is what makes clearing cheap, and clearing is the lever that actually moves the bill, because it's the only thing that shortens the context you re-read on every turn.

So if you go looking at your own usage breakdown, don't do what I nearly did and trim the 4%. Look at the line above it. The session you never cleared is the expensive one. The few percent you spend distilling is what buys you permission to clear it without losing your place.