Data handling.
What data flows where, how it's classified, how long it's kept, and which third-party APIs touch it. Covers SOC2 P2 and GDPR Art.30 records of processing.
Version 1.0 · Effective: 2026-04-24 · Owner: Seamus Waldron
Overview
Hydrate captures Claude Code session transcripts, extracts facts using an LLM, stores those facts locally, and injects them back into future sessions. For Free and Pro tiers, all data stays on your machine. For Enterprise (SiteEngine AI), all data stays on the customer's infrastructure. Sedasoft does not receive or process session content from any tier.
Free and Pro tier: local-only data flow
Developer workstation
│
├── Claude Code (session transcript JSONL)
│ │
│ ↓ Stop hook (claude-capture)
├── hydrate-server (port 49849, localhost only)
│ │
│ ├── Transcript stored: ~/.hydrate/hydrate.db (AES-GCM encrypted)
│ │
│ ├── scrubber.Redact() - secrets stripped before any processing
│ │
│ ├── (Pro) Fact extraction → OpenAI API (gpt-4o-mini)
│ │ Data sent: session narrative (post-scrub)
│ │ Data received: extracted fact strings
│ │
│ ├── (Pro) Embedding → OpenAI API (text-embedding-3-small)
│ │ Data sent: session narrative text
│ │ Data received: 1536-float vector
│ │
│ └── Facts + vectors stored: ~/.hydrate/hydrate.db
│
│ ↑ UserPromptSubmit hook (claude-context)
└── Context injection → Claude Code additionalContext
All local reads from SQLite - no network call at inject time Third parties that touch user data (Free / Pro)
Enterprise tier: SiteEngine AI
Developer workstation │ │ (Enterprise: hooks point at siteengine_ai, not localhost) ↓ siteengine_ai API (customer infrastructure) │ ├── PostgreSQL RAG DB - embeddings, entities, RAPTOR summaries ├── PostgreSQL Conversation DB - sessions, facts, messages └── Dgraph - knowledge graph
Enterprise data stays entirely within the customer's own infrastructure. No session content transits Sedasoft servers. The siteengine_ai binary runs on customer-owned hardware. For on-premise deployments, the customer is both data controller and processor.
Third parties for Enterprise
Data classification
| Data type | Classification | Location | Exits machine? |
|---|---|---|---|
| Session transcript (raw JSONL) | Restricted | ~/.hydrate/hydrate.db | No |
| Post-scrub narrative (text) | Restricted | Sent to OpenAI (Pro only, with DPA) | Yes (Pro only) |
| Extracted facts | Confidential | ~/.hydrate/hydrate.db | No |
| Embedding vectors | Internal | ~/.hydrate/hydrate.db | No (computed externally, stored locally) |
| Session summaries | Confidential | ~/.hydrate/hydrate.db | No |
| Licence key | Internal | ~/.hydrate/config.yaml, Cloudflare licensing | Yes (licence token only) |
| Usage / token counts | Internal | ~/.hydrate/hydrate.db | No |
Retention
| Data type | Default retention | User control |
|---|---|---|
| Extracted facts | Ebbinghaus decay curve; facts weaken without reinforcement over ~180 days | hydrate facts forget <id> for individual facts; hydrate delete for all |
| Session transcripts | 90 days (configurable) | hydrate delete --sessions |
| Session summaries | 90 days (same as sessions) | Deleted with sessions |
| Embedding vectors | Lifetime of the associated fact or session | Deleted with parent record |
| Dashboard statistics | 30 days rolling | Cleared on delete |
| OpenAI API data (Pro) | Per OpenAI's Data Processing Addendum | Subject to OpenAI's retention controls |