← Compare Compare · orchestration

The autoresearch loop, generalised to software

Karpathy's autoresearch proved one idea cleanly: let an agent loop autonomously overnight, but only keep work that clears an objective gate. Hydrate's Design and Develop modes run that same discipline for software, across vendors, with memory and an audit trail underneath.

autoresearch is not a memory tool, and not a Hydrate competitor. It is the clearest published example of the pattern Hydrate's orchestration is built on: bounded autonomy under a measurable gate. The interesting question is what that pattern looks like once you take it off a single GPU and into a real engineering team. That is what Hydrate's Design and Develop modes are.

autoresearch vs Hydrate · Design + Develop

Dimension	karpathy/autoresearch	Hydrate · Design + Develop
Shared idea	Autonomous loop with a keep/discard gate	Same pattern: autonomous rounds, gated before anything is accepted
The loop	Edit `train.py` → train 5 min → check metric → keep or discard → repeat	Design / Develop rounds → acceptance block + verify pass → accept or reject
What's gated	One held-out scalar (`val_bpb`)	A checkable acceptance block (definition-of-done) + goal-coverage contract
Agents	Single agent editing one file	Multi-agent, multi-vendor: Claude implements, Codex reviews and judges, Fable as fallback
Domain	LLM training on a single GPU	Software engineering across repositories
Memory	- none	Runs on a shared, cross-runtime memory substrate
Audit trail	Logs results to disk for morning review	Durable, attributed audit trail of every round and decision
Human role	Review the logs in the morning	Supervisory sign-off mid-run (Design); integration gate (Develop)
Multi-agent swarms	Noted as a future direction, not implemented	The product: orchestration is the engine, not a roadmap note
Stack	Python + PyTorch + uv, NVIDIA GPU required	Single Go binary + SQLite, no GPU
Licence / maturity	MIT, open source	Commercial · Design proven, Develop live

Facts verified against the repository (June 2026): MIT, tens of thousands of GitHub stars, single-agent, no memory or cross-tool layer, train.py is the only agent-editable file, gated on val_bpb.

What autoresearch is

A self-contained harness for autonomous ML experimentation. Three files: prepare.py (data prep, untouched), train.py (the model and training loop, the only file the agent may edit), and program.md (human-authored research directives). The agent edits train.py, trains for a fixed five-minute budget, checks whether validation bits-per-byte improved, keeps or discards the change, and repeats, logging outcomes for a human to review in the morning. The design is deliberately minimal and points at agent swarms as a future direction.

The shared pattern, and what Hydrate adds

Both tools refuse to trust an agent's self-report. An agent may iterate without a human in the inner loop, but work is only accepted when it clears an explicit bar. autoresearch's bar is one number on a held-out split. Hydrate's bar is the acceptance block: a measurable definition-of-done plus a goal-coverage check, enforced by a verify pass in Develop mode before work is integrated.

Everything else Hydrate brings is what that pattern needs to survive outside a single GPU. Multiple agents across vendors (Claude implements, Codex reviews and judges, Fable as fallback). A shared cross-runtime memory substrate, so agents inherit context instead of starting cold. A durable audit trail with attribution, and human sign-off at the points that matter. autoresearch has none of these, because optimising one model overnight does not need them. Coordinating an engineering epic does.

Where autoresearch genuinely wins

ML training has one clean differentiable number to hill-climb. Software does not, so Hydrate's gate has to be a structured, checkable contract rather than a loss curve. We took the discipline, not the metric. For the record, Hydrate's acceptance block is, by our own design history, autoresearch-inspired. Borrowing the best idea in the category and saying so is more useful than pretending we invented it.

See Hydrate orchestration

/orchestration walks through Design and Develop in full.

brew install gethydrate/hydrate/hydrate
hydrate setup

Questions? [email protected] · Homepage · /compare