Night Gardener¶
The Night Gardener is HydraMem's autonomous offline learning engine. It analyses stored conversations, infers new knowledge edges, verifies them with a two-level pipeline, and prunes stale or spurious relations — all without human intervention.
Overview¶
Night Gardener cycle
│
├── Phase 1: Relation Inference
│ LLM reads stored Q&A sessions
│ Proposes candidate edges between entities
│
├── Phase 2: Two-level Verification
│ SR-MKG → topological confidence score (no LLM)
│ ≥ 0.7 → accept < 0.3 → reject
│ 0.3–0.7 → forward to VoG
│ VoG → LLM step-by-step groundedness check
│ GROUNDED / PARTIAL / REJECTED + confidence
│
├── Phase 2.4: Temporal invalidation (no LLM)
│ New functional fact → close the old conflicting edge's valid_to
│
├── Phase 2.5: Consolidation (no LLM)
│ Boost relations of entities reused across ≥2 sessions
│ Decay aged one-off isolates; protect reused nodes from prune
│
├── Phase 3: Pruning
│ Rule-based: remove isolated nodes, zero-confidence edges
│ LightGNN: neural spurious-edge scoring (optional)
│
└── Phase 3.5: Prune review (opt-in, no LLM)
Sample borderline spurious-edge candidates → human labels
Golden dataset → learned (supervised) edge scorer
Running the Night Gardener¶
On demand (CLI)¶
# Run a full cycle on the "default" project
uv run python -c "
from hydramem.garden.gardener import NightGardener
print(NightGardener().run())
"
# Run on a specific project
uv run python -c "
from hydramem.garden.gardener import NightGardener
print(NightGardener().run(project='myproject'))
"
Via MCP tool (from your AI client)¶
Dogfooding script¶
Scheduled (cron)¶
Add to your crontab (crontab -e):
# Run HydraMem Night Gardener at 3 AM daily
0 3 * * * cd /path/to/hydramem && uv run python -c "from hydramem.garden.gardener import NightGardener; NightGardener().run()" >> ~/.hydramem/garden.log 2>&1
Or with systemd timer (~/.config/systemd/user/hydramem-garden.timer):
[Unit]
Description=HydraMem Night Gardener
[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true
[Install]
WantedBy=timers.target
# ~/.config/systemd/user/hydramem-garden.service
[Unit]
Description=HydraMem Night Gardener
[Service]
WorkingDirectory=/path/to/hydramem
ExecStart=uv run python -c "from hydramem.garden.gardener import NightGardener; NightGardener().run()"
Enable:
Phase 1: Relation Inference¶
The Gardener retrieves locally stored sessions captured during real agent interactions. Sessions are grouped by session_id, so one interaction can accumulate multiple evidence entries from HydraMem tools.
Each session contains compact snapshots such as the user's query plus grounded context from priming_context_tool, hydra_search_tool, and expand_context_tool, as well as summaries from trace_path_tool, verify_relation_tool, and check_conflict_tool.
Before inference, Night Gardener can filter and prioritize these entries by repeat_count. Set night_gardener.min_repeat_count above 1 to bias inference toward evidence that has been observed repeatedly across the same session.
HydraMem does not have access to the client's private chain-of-thought, so that reasoning is not stored. Only the query and the verified context emitted by the MCP server are persisted for offline analysis.
For each session it sends a structured prompt to the LLM:
You are a knowledge-graph curator. Analyse the following Q&A session and
propose up to 5 NEW relations between entities that are NOT yet explicitly stated.
Session: <text>
Known entities: Entity1, Entity2, …
For each proposed relation output one line:
FROM_ENTITY –[RELATION_TYPE]→ TO_ENTITY | CONFIDENCE: <0.0-1.0>
The response is parsed with a regex into Relation dataclass instances.
LLM routing: controlled by night_gardener.infer_with in config.yml. Defaults to local to avoid API costs.
Phase 2: Two-level Verification¶
SR-MKG (topological, no LLM)¶
Each candidate relation is scored based on:
| Factor | Weight |
|---|---|
| Jaccard common-neighbour coefficient | 40% |
| Base confidence from inference step | 40% |
| Named relation type boost | +5% |
| Degree penalty (isolated endpoints) | –30% |
| Score | Action |
|---|---|
| ≥ 0.7 | Auto-accept |
| < 0.3 | Auto-reject |
| 0.3–0.7 | Forward to VoG |
VoG (LLM step-by-step)¶
VoG sends the candidate relation and both source text fragments to the LLM:
Proposed: "HydraMem" –[stores_vectors_in]→ "LanceDB"
Fragment A: "…HydraMem uses LanceDB to index all chunk embeddings…"
Fragment B: "…LanceDB is an embedded, serverless vector database…"
→ GROUNDED CONFIDENCE: 0.91
Result mapping:
- GROUNDED → accepted, confidence from LLM
- PARTIAL → accepted with reduced confidence (×0.6)
- REJECTED → discarded
LLM routing: controlled by verification.vog_use_local_llm and night_gardener.verify_with.
Cost control: vog_max_candidates (default 30) caps how many borderline relations are sent to VoG per cycle.
Phase 2.4: Temporal invalidation (fact supersession)¶
Opt-in (night_gardener.temporal_invalidation.enabled). When a newly verified
relation has a functional type (one you list in
temporal_invalidation.functional_types, e.g. located_in), older edges with
the same subject + type but a different object have their validity window
closed (valid_to stamped) instead of being left as a stale contradiction
— the Zep/Graphiti temporal-knowledge-graph pattern, no LLM, no deletion.
History is preserved, so as_of queries return the old fact before the change
and the new one after it. Count: relations_invalidated in garden-status.
Phase 2.5: Consolidation (retrieval-success re-weighting)¶
Between verification and pruning the Gardener re-weights memory by what gets reused across sessions — turning it from a "cron + LLM" into a real episodic→semantic consolidator. No LLM call is in this path.
The reuse signal is derived from telemetry already stored locally
(entity_reuse() over the events table — the entity ids each hydra_search
touched, within window_days). For each entity:
- Boost (reused across ≥ 2 distinct sessions): the entity's outgoing
relations gain confidence
tanh(sessions / 5) · boost_per_session, divided by √degree so a few hubs don't run away (popularity-bias guard). Such entities are also protected from pruning, even when otherwise isolated. - Decay (a single-session isolate older than
decay_after_days): confidence drops by up todecay_per_step, growing with how overdue it is.
All adjustments are clamped to [min_confidence, max_confidence] (no runaway).
The counters entities_boosted, entities_decayed and prune_protected are
exposed in garden-status for audit. Configure under
night_gardener.consolidation.* (see configuration.md);
set enabled: false to turn it off.
Phase 3: Pruning¶
Rule-based¶
- Remove entities with zero relations (isolated nodes)
- Remove relations with confidence <
srmkg_threshold_reject - Remove relations where neither endpoint has any other connections
LightGNN (optional)¶
A lightweight Graph Neural Network scores each edge for spuriousness. Edges with a high spuriousness score are added to the prune list.
# Train the LightGNN pruner from your AI client via the MCP tool:
# "Train the LightGNN pruner on my default project"
# → AI calls: train_gnn_tool(project="default")
GNN backend auto-detection:
1. torch + torch_geometric → PyG backend
2. torch + dgl → DGL backend
3. Heuristic fallback (always available)
Phase 3.5: Prune review (human-in-the-loop, opt-in)¶
When night_gardener.review.enabled is set, the Gardener turns pruning into an
active-learning loop that trains the GNN edge scorer from human-verified
labels — no LLM, nothing leaves the machine.
- Capture. A sample of borderline spurious-edge candidates (spuriousness
near the 0.65 threshold — uncertainty sampling) is queued locally with its
structural features. Nothing is deleted. Tuned by
review.sample_rate,review.uncertainty_band,review.max_per_run. - Label.
hydramem reviewmarks each queued edgeprune/keep/skip;--statusshows counts and--export PATHwrites the golden dataset as JSONL. - Train.
hydramem train-prunerfits a pure-NumPy logistic scorer over the labelled features (refusing too-few-samples / single-class, honestly) and saves~/.hydramem/projects/<p>/prune_weights.json;GNNPrunerthen prefers thislearnedbackend automatically. - Auto-train (optional).
review.auto_trainretrains at the end of a cycle once enough labels exist.
Counters prune_reviews_queued and pruner_retrained surface in
garden-status. SOTA rationale + design:
docs/internal/future_work/hitl-prune-review.md.
Status and history¶
Status is persisted to ~/.hydramem/garden_status.json:
{
"last_run": "2026-05-07T03:00:12+00:00",
"total_runs": 42,
"relations_proposed": 317,
"relations_accepted": 189,
"relations_rejected": 128,
"session_entries_filtered_repeat_threshold": 94,
"nodes_pruned": 14,
"edges_pruned": 23,
"is_running": false
}
Each run_night_gardener execution also returns per-run filtering metrics such as sessions_considered, sessions_used, session_entries_considered, session_entries_used, and session_entries_filtered_repeat_threshold so you can see how aggressively night_gardener.min_repeat_count is narrowing the evidence set.
Configuration reference¶
# config.yml
night_gardener:
enabled: true
schedule: "0 3 * * *" # cron expression
infer_with: local # local | openai | anthropic | auto
verify_with: auto # local | openai | anthropic | auto
verification:
srmkg_threshold_accept: 0.7
srmkg_threshold_reject: 0.3
vog_max_candidates: 30
vog_use_local_llm: true # true → VoG ignores global provider and uses local
Design rationale¶
Why run it offline?
Relation inference requires careful, multi-step LLM reasoning. Running it at low-activity periods (3 AM) avoids competing with real-time query latency.
Why not infer at query time?
SR-MKG + VoG adds 200–2000 ms per relation. For an interactive query returning 20 relations, that would be intolerable.
Why separate LLM routing?
Bulk inference (Phase 1) generates many candidate relations that will mostly be rejected. Running this with a local model (free) and reserving the external API for only the borderline cases (VoG) reduces cost by 70–90 %.