Hydrate vs Bytebell
Bytebell answers "where is the auth code?". Hydrate answers "what did we decide about it last Tuesday?". These are permanently different things — and the honest pitch for a serious team is run both.
Where is the authentication code? How does it connect to the
API layer? Which repos import parse_file? What
changed in the last commit that touched the auth module?
What did we decide about the authentication implementation last Tuesday? Why did we abandon the session-token approach? What JWT convention did Tom establish? What did Dick and Harry need to know before touching the auth layer?
The graph tells the agent what the code is. Hydrate tells it what the team decided. You cannot derive one from the other — and on a serious project both questions get asked daily.
At a glance — different layers, different problems
At a glance
| Dimension | Hydrate | Bytebell |
|---|---|---|
| Core problem | Session continuity | Codebase archaeology |
| What it indexes | Session transcripts, decisions, pinned facts | Files, classes, functions, imports, per-file semantics |
| When value peaks | Ongoing project; returning to work; team coordination | First session in unfamiliar 10k+ file codebase |
| Capture | Automatic Stop hook | bytebell index <url> per repo |
| Injection | Automatic UserPromptSubmit hook | AI calls 3 MCP tools |
| Storage | SQLite | Neo4j + Mongo + Redis |
| Runtime | Single Go binary | Bun + Docker (3 services) |
| BYOK key required | - | OpenRouter (at ingest) |
| Team sync | team push/pull | OSS: single-tenant · Enterprise: multi-tenant |
| Hydration packs | .hpack | - |
| Compact-survival | PreCompact + SessionStart | - |
| Cross-vendor MCP | ✓ hydrate-mcp | ✓ 127.0.0.1:8080/mcp |
| Codebase structural graph | - | ✓ Neo4j with AST + imports |
| Per-file LLM semantics | - | purpose / summary / businessContext |
| Licence | Closed source (v0.2.0 beta) | AGPL-3.0 + non-commercial · Enterprise separate |
✓ present · - not present · bold + tinted cells mark the side that ships the capability.
What is Bytebell?
ByteBell/bytebell-oss
is a local knowledge graph for your codebase, served over
MCP. It indexes a repo into a Neo4j graph where every
file node carries LLM-generated purpose,
summary, and businessContext fields, so
Claude Code, Cursor, and other MCP-capable agents can answer
questions about your code without reading the whole repo into
context.
The architecture is grounded in recent research. The README cites RepoGraph (ICLR 2025, +32.8% on SWE-bench), CodexGraph (NAACL 2025), CGM (43% on SWE-bench Lite), and several more — the converging finding is that purely structural retrieval (AST / call-graph) and purely semantic retrieval (embeddings) each leave large performance on the table, and combining them at index time unlocks the gains.
Bytebell binds to 127.0.0.1 only. No telemetry, no
auth, no remote network surface. The OSS edition is AGPL-3.0
with a non-commercial clause; commercial deployment requires the
separately-licensed Enterprise edition.
How to install Bytebell
# Prerequisites: Bun >= 1.1, Docker, OpenRouter API key
bytebell set openrouter-api-key sk-or-…
bytebell set openrouter-model anthropic/claude-sonnet-4.6
bytebell boot
bytebell index https://github.com/your/repo
bytebell ls
Then point your MCP client at
http://127.0.0.1:8080/mcp. The boot step pulls
Mongo, Neo4j, and Redis via docker-compose; data persists in
named volumes across reboots. Configuration lives at
~/.bytebell/config.json (mode 0600); there's no
.env, and bytebell set is the only
sanctioned write path.
Where Bytebell is genuinely stronger
- Repository structural graph. Per-file
imports, calls, class hierarchy. Bytebell knows that
auth/middleware.goimportsinternal/jwtwhich extendscrypto/ed25519. Hydrate has none of this — Hydrate's data is session-derived, not code-derived. - LLM-enriched semantic surface per file.
purpose,summary,businessContextfields close the "vocabulary gap" — the model can match what a developer means, not just what the code spells. - Research-grounded design. The README's bibliography is a real one. If you're sold by the recent literature on hybrid structure-plus-semantic retrieval, Bytebell is the most direct application of it.
- Enterprise deployment shape. SSO / SCIM, audit logging, multi-tenant patterns, connectors to Confluence / Jira / Notion / GitHub Enterprise are documented in the Enterprise edition.
Where Hydrate is stronger
- Session memory. This is the deepest distinction. Bytebell knows the codebase structure. It has no mechanism to know that last Tuesday your team decided to use JWT over session cookies, or that you tried the GraphQL approach and abandoned it. Those decisions live in session transcripts, not import graphs.
- Automatic capture, zero developer overhead.
Hydrate's Stop hook fires on every session without any
command. Bytebell needs
bytebell index <url>per repo. - Automatic injection before the prompt. Hydrate's UserPromptSubmit hook injects ranked context before the model reads the prompt. Bytebell's three MCP tools require the model to call them — meaning the model has to decide to remember.
- Team sync + canon propagation. Hydrate has
team push/pullfor canon propagation across machines, plus.hpackarchives. Bytebell OSS is single-tenant per Neo4j instance. - Compact-survival. Hydrate intercepts the PreCompact event and writes a snapshot. On the public benchmark (n=30, 3 complexity buckets) Hydrate recovers 27% of in-flight tasks across compaction; tools without this hook recover 0%.
- Zero LLM cost to operate. Hydrate's distill is local Go (TF-IDF plus sentence scoring), no LLM call. Over a 26-hour orchestration sprint Hydrate compressed 25.5M raw tokens to 142K stored summary at $0 compression cost. Bytebell calls OpenRouter on every file at ingest.
- Single binary install. Hydrate is one Go binary plus SQLite. Bytebell needs Bun, Docker, Mongo, Neo4j, and Redis.
When to pick each
| Scenario | Better choice |
|---|---|
| First session in unfamiliar 10k+ file codebase | Bytebell |
| Cross-repo dependency tracing | Bytebell |
| Ongoing project, 2+ weeks of session history | Hydrate |
| Team of 3+ devs, decision propagation | Hydrate |
| Mid-session compaction-recovery | Hydrate |
| Onboarding a new dev | Both — composes |
| Privacy-sensitive / air-gap | Both — composes |
| Cost-sensitive (no ongoing API spend) | Hydrate |
| Commercial deployment, AGPL-incompatible | Hydrate (or Bytebell Enterprise) |
Run both — the strongest joint pitch in the space
A Claude Code session can speak to both MCP servers simultaneously without configuration conflict:
{
"mcpServers": {
"bytebell": { "type": "http", "url": "http://127.0.0.1:8080/mcp" },
"hydrate": { "command": "hydrate-mcp" }
}
} The result: instant structural understanding of any codebase (Bytebell) plus persistent memory of every decision made while working in it (Hydrate). Two complementary memory layers, one session.
Brutally honest — what Hydrate doesn't do that Bytebell does
- No codebase graph. Hydrate can't answer "which files import
parse_file?" or "what callsvalidateJWT?". - No per-file LLM semantics. Hydrate doesn't know what
auth/middleware.gois for. - No academic-citation depth. Bytebell's README cites a dozen recent papers; Hydrate's positioning is empirical (the benchmarks) rather than research-cited.
If your problem is "agent doesn't understand my codebase", Bytebell is the right tool. If your problem is "agent doesn't remember what my team decided", that's Hydrate. The honest answer is to run both.