Multi-agent orchestration · stable

Coordinate many agents, across many models.

Orchestration is a single-writer coordinator, run by the local Hydrate daemon, that drives a fleet of worker agents through a shared, typed blackboard. One interactive session authors the work and holds the human gates. Workers never message each other and never touch shared state directly: they communicate only through structured artifacts (plans, patches, reviews, verdicts) and through git. That structure is what turns a pile of agents into a process you can actually trust.

Install Hydrate Read the docs

Why not just let the agents talk?

Most "multi-agent" tools let agents chat to each other and hope coordination emerges. It doesn't: they lose track of who did what, and no agent can trust work it didn't watch happen. Hydrate takes a different route. The daemon owns all state, spawns the workers, enforces the rules, and drives the fleet to convergence. It rides on assets only Hydrate has: a shared cross-runtime memory layer, runtime hooks with a real completion signal, a local daemon, and a headless agent spawner already in the tree.

Four modes

Converge a document. A codebase. A UX. Or an image.

Every mode gates on a structured acceptance block, a checkable definition of done with its own pass criteria, before a session can be accepted (v0.8.0). A run is not done because it looks done; it is done when its own stated criteria verify.

Design mode proven

Converge a document through adversarial critique.

Design mode takes a written design, a spec, an RFC or an architecture proposal, and converges it through rounds of adversarial critique.

You start a session against a draft.
Each round dispatches a critic from a different model family, which reads the draft plus the running objection register and files structured objections.
Objections are tracked by content, not by who raised them, so the same concern cannot be relitigated under a new label. A contested objection raised unchanged twice is escalated for you to rule on, rather than looping forever.
You rule on each material objection (accept or contest); the author revises; the next round validates that resolved objections stay resolved.
When zero material objections remain and the trend is converging, the session moves to sign-off: your final human gate.

Hard round cap: 8. Output: a finalised design plus a full decision log of every objection and how it was resolved.

Develop mode live

Parallel implementation across projects.

Develop mode takes a set of work units, across one or several repositories, and runs them to verified, reviewed, integration-ready code.

You define targets (project, base branch, test command) and a set of work units.
Each unit runs in its own isolated git worktree, so parallel implementers never collide.
Per unit the pipeline is implement, review, judge. The implementer writes the patch; a reviewer from a different model family reads it in a read-only worktree and files objections; a judge scores it against a five-point rubric: plan adherence, every verification step green, scope containment, no regressions, no unresolved review objections.
A failing review or verdict requeues the unit for another attempt, bounded by a per-unit cap so nothing loops indefinitely. A unit that cannot pass is handed to you, not silently dropped.
When the units are verified, you open the integration gate. Verified units merge into per-target integration branches: never your main branch, never your working checkout. A senior audit pass then re-runs the tests and checks for regressions before the session reports done.

Output: per-target integration branches ready for your normal pull-request flow. The mode deliberately stops short of main; the merge decision stays yours.

Studio mode live

Converge a UX or visual deliverable, design-first.

Studio mode takes a visual deliverable, web or app UX, a presentation, a document, and converges it the same way Develop converges code: a design-first gate up front, then build and review by different model families.

You define the deliverable and its acceptance criteria.
Fable 5 designs the approach and the deliverable passes a design-first gate before any build begins.
Sonnet 5 builds the deliverable; Opus reviews it from a separate seat against your criteria.
A failing review requeues the work, bounded by a cap; anything that cannot pass comes back to you rather than shipping.

Output: a converged UX or visual deliverable with the design rationale and review verdicts recorded. Shipped in v0.7.0.

Image mode new

Generate images with a critic in the loop.

Image mode takes an image brief and runs it as a real choreography. The same cross-family check that guards code now guards pixels.

You (the interactive head) author the image spec.
A Codex generator renders the image into an output folder.
A vision judge, a different model from the one that drew it, scores the result against your spec on prompt match, constraints, and visual artifacts.
A passing image lands for your approval. A failing one is regenerated automatically up to a cap, then handed to you: accept, regenerate, or abandon.

Output: a generated image plus the spec, the generation, and the judge's per-criterion verdict, all recorded. The judge being strict is a feature: in our own smoke test it correctly rejected a gradient-shaded circle when the spec asked for flat vector, and the pipeline regenerated rather than shipping it.

Proof from inside the build: Develop mode's own specification was converged by Design mode. A real Codex critic, eight rounds, fourteen objections, human sign-off.

The fleet

Cross-family review is the moat.

The deliberate move is mixing model families by role: the agent that judges the work is not the same family as the one that wrote it, so each catches the other's blind spots. By default Claude implements and Codex reviews and judges, with Fable standing in when Codex is unavailable.

Design mode

Role	Who	Runs
Author	You (Claude session)	interactive
Critic	Codex	one per round, sequential
Arbiter	You	sign-off and escalation gate

Develop mode

Role	Who	Runs
Orchestrator	You (Claude session)	interactive
Implementer	Claude (Sonnet 5)	one per unit, in parallel
Reviewer	Codex	per unit, read-only worktree
Judge	Codex	per unit, scores the rubric
Audit	Claude (Opus)	per target, after integration
Arbiter	You	integration and override gates

Image mode

Role	Who	Runs
Orchestrator	You (Claude session)	authors the image specs
Generator	Codex	renders the image into the output dir
Judge	Codex vision (independent of the generator)	scores the image against the spec
Arbiter	You	accept / regenerate / abandon gate

Parallelism is bounded by a per-project spawn cap (two concurrent workers by default, ceiling eight), so a fleet never runs away with your machine. Workers are hydrated with the relevant project's memory but walled off from every other target's context, so cross-project secrets never leak between units.

Author a fleet in the browser. As of v0.10.0 you can build a run from the dashboard canvas: add agents, set each one's model and project directory, draw the dependencies between them, and hit Run, with per-agent critic overrides. All four modes roll up into one unified authoring surface, so the fleet you draw is the fleet that executes, with truthful per-orchestration spend and timers reported back on the dashboard.

This is what Hydrate's shared memory makes possible: a Claude implementer and a Codex reviewer can work the same task, with the same context, because the memory layer underneath them is common ground. The cross-family review is the moat, and it only works when the runtimes share a memory.

Versus the field

A group chat with extra steps.

Most multi-agent tooling is a group chat with extra steps. Hydrate treats coordination as infrastructure: a single coordinator, a shared memory, and hard rules. That is what makes the output trustworthy instead of merely plausible.

Dimension	Generic multi-agent frameworks	Hydrate orchestration
Coordination model	Agents message each other and hope alignment emerges	A single-writer coordinator drives workers through a typed blackboard; no agent-to-agent chat
State ownership	Shared and implicit; prone to races and lost updates	The local daemon is the sole writer; every transition is compare-and-set and idempotent
Model diversity	Usually one model family across all roles	Cross-family by design: Claude implements, Codex reviews and judges
Verification	Output trusted as-is, or the same model reviews itself	Independent review from a different family, then a judge scores against a fixed rubric before anything passes
Shared memory	Per-agent context or ephemeral scratchpads	One cross-runtime memory layer, so every worker shares the same ground truth
Workspace isolation	Shared working directory; agents collide	Each unit runs in its own hermetic git worktree
Safety to your repo	Agents often write straight to your tree	Verified work lands on integration branches only; never touches main, never your checkout
Termination guarantees	Loops and runaway fleets are common	Hard round caps, lease timeouts, and a spawn cap bound every run
Human control	Fire-and-forget; you read the wreckage after	Explicit human gates (design sign-off, integration approval, override) at the decision points
Failure handling	Silent drift; partial observability of who did what	A unit that can't pass escalates to you; nothing is silently dropped

The comparison is against the general pattern, not any single named framework. The coordinator is hardened to match: as of v0.8.0 a flaky reviewer fails closed rather than wedging a session, the per-project spawn cap is an atomic test-and-set gate proven under the race detector, and an interrupted close resumes on the next daemon boot.

Image mode vs generic image tooling

Dimension	Generic image tooling	Hydrate Image mode
Quality control	You eyeball the output yourself	An independent vision judge scores every image against the spec before it ships
Failure handling	Regenerate by hand and re-check	Auto-regenerates on a failed verdict up to a cap, then escalates to you
Provenance	A prompt and a file	The spec, the generation, and the judge's per-criterion verdict are all recorded

The same engine that runs Design, Develop and Studio runs Image: author a spec, a generator renders it, and a different model judges the result against the spec. A bad image is caught and regenerated, not shipped.

The autoresearch loop, generalised to software

Karpathy's autoresearch proved one idea cleanly: let an agent loop on its own overnight, but only keep work that clears an objective gate. Design and Develop run that same discipline for software, across vendors, with memory and an audit trail underneath.

Dimension	karpathy/autoresearch	Hydrate Design + Develop
The loop	Edit `train.py`, train 5 min, keep or discard on `val_bpb`, repeat	Design and Develop rounds, gated by an acceptance block and a verify pass before anything is accepted
What's gated	One held-out number (`val_bpb`)	A checkable acceptance block plus a goal-coverage contract
Agents	A single agent editing one file	Multiple agents across vendors: Claude implements, Codex reviews and judges
Domain	LLM training on one GPU	Software engineering across repositories
Memory and audit	None; results logged to disk for the morning	A shared cross-runtime memory layer and a durable, attributed audit trail
Human control	Read the logs after the run	Sign-off mid-run and an integration gate at the decision points

Where autoresearch wins: ML training has one clean number to hill-climb, and software does not, so Hydrate's gate is a checkable contract rather than a loss curve. The acceptance block is autoresearch-inspired, and we say so. Read the full comparison.

Multi-model, cross-runtime

Why an implementer in one model family and a reviewer in another can share state.

Advanced shared memory

The local-first layer that makes coordination possible.

vs Claude Workflow

In-session fan-out vs durable, cross-runtime orchestration, and where each wins.