Hydrate vs claude-mem
Two session-memory tools for Claude Code, two opposite philosophies. We measured them head-to-head.
Delta: +6.29. Hydrate's worst cell (18 / 20)
beat claude-mem's best (17 / 20) on
scenario-lquery (Sonnet 4.6, fresh Docker
containers per cell).
Methodology →
claude-mem and Hydrate are the two closest tools in the session-memory space — they both hook into Claude Code to capture what happens in a session and inject it back later. Then they choose opposite architectural philosophies, and the benchmarks tell you which one works better at the canonical "did we keep the right context across the boundary?" question.
At a glance
| Dimension | Hydrate | claude-mem |
|---|---|---|
| Language / runtime | Go stdlib | Node + Bun + uv |
| Runtime deps | None | Node 18+, Bun 1+, uv |
| Compression mechanism | Deterministic local | LLM-powered (Claude Agent SDK) |
| Compression cost | Zero | API tokens per tool execution (~100×/session) |
| Memory decay | ✓ per-fact decay_rate | - |
| Capture hook | Stop (session end) | PostToolUse (every tool call) |
| Compaction-survival hooks | PreCompact + SessionStart | SessionStart matcher only |
| <code>[GOAL:*]</code> lane | ✓ | - |
| Doctor command | 17 checks + --report | ✓ |
| Orchestrator subcommand | ✓ | - |
| Team sync (canon push/pull) | team push/pull | - |
| Hydration packs (<code>.hpack</code>) | ✓ | - |
| MCP server | ✓ hydrate-mcp | ✓ |
| Gemini CLI integration | via MCP | ✓ --ide gemini-cli |
| Per-tool-call observation | - | ✓ PostToolUse |
| Licence | Closed source (v0.2.0 beta) | Apache 2.0 |
| Latest release | v0.2.0-incident-hardening | v13.2.0 (2026-05-12) |
| scenario-lquery score | 19.40 / 20 | 13.11 / 20 |
| compact-survival recovery | 27% | 0% |
✓ present · - not present · bold + tinted cells mark the side that ships the capability.
Long-query memory recall, multi-clause project context
“Hydrate's worst cell (18 / 20) beat claude-mem's best (17 / 20).”
| Tool | Mean / 20 | 95% CI | n |
|---|---|---|---|
| Hydrate | 19.40 | [19.00, 19.80] | 10 |
| claude-mem 13.2.0 | 13.11 | [11.67, 15.00] | 9 |
Run on claude-sonnet-4-6 in fresh Docker containers, one container per cell, no cross-run state. Hydrate arm: 37 canon facts pinned at the always tier. claude-mem arm: plugin v13.2.0. One claude-mem cell failed to complete on a Docker container restart, leaving N=9; the distribution is bimodal (six cells at 11–13, three at 17). 95% CIs are bootstrap, 10000 resamples, seed 42. Raw scores at commit 7105ede of hydrate-benchmark.
Stop → SessionStart in-flight task recovery
“Hydrate is the only memory tool with non-zero recovery on the canonical compact-survival scenario.”
| Tool | Recovery rate | 95% CI | n |
|---|---|---|---|
| Hydrate | 27% | [14%, 44%] | 30 |
| claude-mem 13.2.0 | 0% | [0%, 11%] | 30 |
Stop → SessionStart in-flight task recovery, n=30 per tool (3 complexity buckets × 10 cells). "Recovered" means the next session named the task key and its next step without the operator prompting. Breakdown: Hydrate recovered 8/30, was wrong 8, asked 6, returned nothing 8 — the failure mode is restraint. claude-mem recovered 0/30, was wrong 28, asked 1, returned nothing 1 — the failure mode is confabulation. CIs are Wilson intervals.
What is claude-mem?
thedotmack/claude-mem
(homepage claude-mem.ai)
is a Node + Bun + uv stack that captures every Claude Code tool
execution via PostToolUse hooks and forwards
observations to a worker service running on
localhost:37777. The worker uses the Claude Agent
SDK — i.e. it calls back to the Anthropic API — to compress
those observations into "structured learnings" (facts,
decisions, insights) extracted via XML parsing.
- Ships as a Claude Code plugin (and a Gemini CLI / OpenCode variant).
- Stack: Node.js 18+, TypeScript, Bun,
uv(Python); SQLite plus ChromaDB for storage. - Licence: Apache 2.0. A paid claude-mem Pro tier adds cross-device sync.
- Maturity: roughly 5.8k GitHub stars; maintained by Alex Newman (@thedotmack). Latest release v13.2.0 (2026-05-12) added native Windows support and first-class Gemini CLI integration.
Both tools share the same broad goal — persistent memory across Claude Code sessions. The rest of this page is about the implementation differences and what they cost.
How to install claude-mem
npx claude-mem install
The installer pulls Node 18+, Bun, and uv (Python
package manager) as runtime dependencies. The worker process
binds to localhost:37777 and the ChromaDB-backed
memory store lives in ~/.claude-mem/. Restart
Claude Code after install; the SessionStart hook fires on every
new session and replays compressed summaries on the
startup|clear|compact matcher.
Native Windows is supported in v13.2.0 per the README. Node 18+
is a hard requirement; the global npm install -g
claude-mem ships the SDK only and is explicitly not the
install path.
Architecture differences (the deep version)
Capture point
- claude-mem captures via the PostToolUse hook — fires after every individual tool execution (every Read, every Bash, every Edit). The observation is queued to the worker for AI-powered compression.
- Hydrate captures via the Stop hook — fires once when the session ends. The hook reads the canonical Claude Code JSONL transcript and runs a deterministic local distill.
Compression
- claude-mem uses the Claude Agent SDK. Each observation triggers an LLM call to extract structured XML. This costs real Anthropic API tokens — claude-mem's own README acknowledges "tokens consumed per tool execution." Over a typical session that's hundreds of background API calls.
- Hydrate uses TF-IDF plus sentence scoring plus per-clause information-density scoring. Pure Go, microsecond-level latency, zero API calls. Over a 26-hour orchestration sprint, Hydrate compressed 25.5M raw transcript tokens to 142K stored summary (0.6% retention) at $0 of compression cost.
Injection
- claude-mem injection is on-demand — Claude has to actively call an MCP search tool to retrieve memory. The model has to decide to remember to look things up.
- Hydrate injection is automatic — the UserPromptSubmit hook injects ranked context before Claude reads the prompt. The model doesn't have to know to remember; the context is just there.
Compaction survival
- claude-mem restores at SessionStart only, which means it relies on Claude Code's own (lossy) compaction summary as the input.
- Hydrate snapshots via PreCompact + SessionStart — the PreCompact hook writes the transcript tail plus a distill before Claude Code's compaction event mutates the conversation. The next session's SessionStart hook reads that snapshot first.
Infrastructure
- claude-mem needs Node 18+, Bun, uv, ChromaDB, plus a worker process on
:37777. - Hydrate is one Go binary plus a SQLite file. No runtime, no worker service, no Python.
Where claude-mem is stronger
- Granular observation capture. claude-mem can answer "what files did you read in turn 47?" because it captured each Read individually. Hydrate has the session as a whole; per-turn metadata requires re-parsing the JSONL.
- First-class Gemini CLI integration. v13.2.0 ships
--ide gemini-clias a documented install path. Hydrate reaches Gemini CLI via the generichydrate-mcpserver; claude-mem markets the integration explicitly. - Apache 2.0 licence. Auditable, forkable. Hydrate v0.2.0 is binary-only beta. If you need source-available for compliance or audit, claude-mem is the right answer.
Where Hydrate is stronger
- Non-zero recovery on compact-survival. 27% vs 0%, n=30. Hydrate is the only memory tool that recovered in-flight tasks across the Stop → SessionStart boundary on the canonical scenario. claude-mem's failure mode is confabulation (28 of 30 cells wrong, answering with confidence); Hydrate's is restraint (8 missing, 6 asking — it returns nothing when it isn't sure). Run it yourself.
- Zero LLM cost to operate. No API call in the compression pipeline. claude-mem's PostToolUse-based capture is paying compression costs on every tool execution; over a long session that's hundreds of API calls.
- PreCompact hook intercepts Claude Code's compaction. Hydrate writes its own snapshot before Claude Code's lossy summary replaces the transcript. claude-mem reads the post-compaction state.
- No worker, no Bun, no Python. One Go binary, one SQLite file. The install surface and the bug surface are an order of magnitude smaller.
- Team sync plus hydration packs.
hydrate team push/pulland.hpackarchives propagate canon across developers. claude-mem has no equivalent — memory is per-developer. - Cross-vendor bidirectional sync proven. Claude Code ↔ Mistral Vibe round-trip in the same session sequence. claude-mem claims MCP compatibility but doesn't demonstrate bidirectional round-trips.
When to pick each one
| Scenario | Better choice | Why |
|---|---|---|
| You want per-tool-call observability | claude-mem | PostToolUse gives granularity Hydrate doesn't capture |
| You want zero ongoing API cost | Hydrate | Local TF-IDF distill, no LLM in pipeline |
| You want lightest install surface | Hydrate | Go binary vs Node + Bun + uv + ChromaDB + worker |
| You want compact-survival that actually survives | Hydrate | Only tool with non-zero recovery on the canonical scenario |
| You want Apache 2.0 / forkable | claude-mem | Hydrate is closed source in v0.2.0 |
| You want team sync across developers | Hydrate | claude-mem has no team push/pull |
| You want first-class Gemini CLI marketing | claude-mem | --ide gemini-cli documented install path |
| You want bidirectional Claude Code ↔ Mistral Vibe | Hydrate | Proven; claude-mem doesn't claim this |
Methodology
scenario-lquery: a long-query memory-recall benchmark. The agent is given a multi-clause question that requires assembling four pieces of project context to answer correctly. Each cell runs in a fresh Docker container with the memory tool pre-loaded with the relevant facts. Scoring is 0–20 based on correctness of the four sub-claims (5 points each) with partial credit for direction-of-correctness.
compact-survival: Stop → SessionStart in-flight task recovery. Phase 1 of each cell establishes an in-flight task in one session; Phase 2 opens a fresh container and asks the agent to name the task key and its next step without the operator prompting. "Recovered" means both were named correctly without asking what they were doing.
Raw data and reproduction commands:
bench/scenario-lquery/RESULTS-vs-claude-mem.md
(commit 7105ede) ·
bench/compact-survival/RESULTS.md.