Architecture¶
HydraMem is built around four layers that work together to provide accurate, low-hallucination context to AI agents.
System diagram¶
┌──────────────────────────────────────────────────────────────────────┐
│ Layer 1: AI Clients │
│ OpenCode · Claude Desktop · Cursor · VS Code Copilot · Custom │
│ Invokes Agent Skills (.github/skills/hydramem-*) │
└─────────────────────────────┬────────────────────────────────────────┘
│ HTTP / MCP (Model Context Protocol)
┌─────────────────────────────▼────────────────────────────────────────┐
│ Layer 2: MCP Server (hydramem/server.py) │
│ FastMCP · 18 tools · multi-provider LLM · telemetry logging │
└──────────────┬──────────────────────────┬────────────────────────────┘
│ │
┌──────────────▼──────────────┐ ┌────────▼───────────────────────────┐
│ Layer 3a: Retrieval │ │ Layer 3b: Autonomous Learning │
│ hydramem/search.py │ │ hydramem/garden/gardener.py │
│ · priming_context (fast) │ │ · Phase 1: Relation Inference │
│ · hydra_search (full) │ │ · Phase 2: SR-MKG + VoG verify │
│ · expand_context │ │ · Phase 3: Graph Pruning │
│ · trace_path │ │ hydramem/gnn_prune.py (LightGNN) │
│ │ │ │
│ hydramem/verification/ │ │ │
│ · SR-MKG scoring │ │ │
│ · VoG LLM verification │ │ │
└──────────────┬──────────────┘ └────────┬───────────────────────────┘
│ │
┌──────────────▼──────────────────────────▼───────────────────────────┐
│ Layer 4: Storage (hydramem/storage/) │
│ ┌──────────────────────────────┐ ┌────────────────────────────┐ │
│ │ LadybugDB / Kuzu │ │ LanceDB │ │
│ │ Graph: entities, relations, │ │ Vector index: embeddings, │ │
│ │ chunks, sessions │ │ ANN search (HNSW) │ │
│ │ Cypher queries │ │ In-memory fallback │ │
│ └──────────────────────────────┘ └────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Telemetry: ~/.hydramem/metrics.db (SQLite) │ │
│ └──────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
Layer 1 – AI Clients¶
HydraMem integrates with any MCP-compatible AI client. The five bundled Agent Skills (.github/skills/) provide structured workflows for the most common operations:
| Skill | Triggered when agent needs to… |
|---|---|
hydramem-query |
Answer a factual question from the knowledge base |
hydramem-reason |
Multi-hop causal reasoning over the graph |
hydramem-ingest |
Add new documents to the knowledge base |
hydramem-link |
Curate relations manually |
hydramem-garden |
Run the Night Gardener maintenance cycle |
Skills are YAML-frontmatter Markdown files that describe tool sequences to the AI client. They work with any client that supports MCP + agent skills (OpenCode, Cursor, Claude Desktop with plugins).
Layer 2 – MCP Server¶
hydramem/server.py runs a FastMCP HTTP server exposing 18 tools.
Key design decisions:
- Every tool logs telemetry atomically before returning — no silent failures.
- Project namespacing — all storage is keyed by project to support multiple independent knowledge bases on one instance.
- Session IDs — each server boot gets a UUID; tools accept an optional session_id override for grouping agent interactions.
- Multi-provider LLM — resolved at startup from config.yml; each tool inherits the global preset unless overridden.
Layer 3a – Retrieval Pipeline¶
priming_context (fast path)¶
Used when the agent needs quick context before a conversational turn.
Query → embed (nomic-embed-text-v1.5)
→ LanceDB ANN top-k chunks
→ entity extraction from query (regex + heuristics)
→ graph neighbour expansion (1 hop)
→ deduplicate + rank
→ return context string + chunk list
Average latency: < 100 ms on a modern laptop (no LLM call required).
hydra_search (full path)¶
Used for deep research questions where accuracy matters more than speed.
Query → embed
→ LanceDB ANN top-k
→ entity expansion (BFS / PPR / hybrid)
→ candidate relation retrieval
→ SR-MKG scoring (topological, no LLM; optionally calibrated)
→ borderline relations → VoG (LLM step-by-step)
→ ranked, verified context
The graph-walk strategy is selectable per request via the traversal
parameter (bfs | ppr | hybrid). ppr runs Personalized PageRank
seeded at query entities (HippoRAG-style); hybrid fuses the vector,
BFS and PPR rankings via Reciprocal Rank Fusion. See
configuration.md#search-traversal.
trace_path¶
Shortest-path query between two entity IDs using graph BFS/Dijkstra. Returns the chain of relations connecting them.
Layer 3b – Two-Level Verification¶
SR-MKG (Scalable Relation Mining with Knowledge Graphs)¶
A fast, LLM-free topological scorer. Computes a confidence score for a relation based on: - Jaccard coefficient of common graph neighbours - Degree penalty for isolated/orphan nodes - Named-relation type boost
Score ≥ 0.7 → auto-accept
Score < 0.3 → auto-reject
0.3 – 0.7 → forwarded to VoG
The four component weights can be replaced per-project by a learned
logistic calibration (hydramem calibrate-srmkg); when a weights file
exists at ~/.hydramem/projects/<p>/srmkg_weights.json the scorer loads
it transparently. See
verification.md#per-project-calibration.
VoG (Verification of Groundedness)¶
An LLM step-by-step check. Given the proposed relation and the two source text fragments:
Proposed: "HydraMem" –[uses]→ "LanceDB"
Fragment A: "HydraMem stores embeddings in LanceDB…"
Fragment B: "LanceDB provides serverless vector search…"
→ GROUNDED CONFIDENCE: 0.94
VoG is only called for borderline relations (vog_max_candidates cap prevents runaway API costs).
Layer 3c – Night Gardener¶
See night-gardener.md for full details.
Three phases: 1. Inference — LLM analyses stored Q&A sessions and proposes new graph edges. 2. Verification — every candidate passes SR-MKG + VoG. 3. Pruning — isolated nodes and spurious edges are removed (rule-based + optional LightGNN).
LightGNN Pruning¶
A lightweight Graph Neural Network that learns to distinguish genuine knowledge edges from co-occurrence noise.
| Backend | Condition |
|---|---|
| PyTorch Geometric | pip install torch torch_geometric |
| DGL | pip install torch dgl |
| Heuristic (default) | No PyTorch — uses betweenness centrality + degree thresholds |
The heuristic approximates GNN results acceptably for most corpora.
When the PyG backend is available, node features default to Laplacian
Positional Encodings (hydramem/garden/spectral.py) concatenated with
normalised degree, instead of the previous random low-rank features. LPE
gives the GNN a real spectral signal at near-zero compute cost. Toggle
via gnn.use_laplacian_pe (default on) and gnn.lpe_k (default 32).
Layer 4 – Storage¶
LadybugDB / Kuzu (graph)¶
The graph store (hydramem/storage/factory.py) wraps LadybugDB (a fork of Kuzu). Schema:
Entity { id, name, type, project }
Relation { from_id, to_id, relation_type, confidence, source_doc_id, project }
Chunk { id, text, source, project, embedding_id }
Session { id, session_id, project, created_at, updated_at, text, entries[] }
Fallback: if LadybugDB is unavailable, the store switches to a NetworkX in-memory graph with JSON persistence.
LanceDB (vectors)¶
Stores chunk embeddings as a LanceDB table with HNSW indexing. Queries return the top-k nearest chunks by cosine similarity.
The embedding model (nomic-ai/nomic-embed-text-v1.5, truncated to 512-d) runs 100 % locally on CPU via fastembed or sentence-transformers.
Telemetry (SQLite)¶
~/.hydramem/metrics.db stores a events table with per-tool-call metrics. See telemetry.md.
Module Map¶
The codebase is organised around SOLID principles and a strict dependency hierarchy — arrows show allowed import direction (lower layers never import from higher ones).
hydramem/core/ ← zero dependencies (types, config, logging, tokens)
hydramem/llm/ ← depends on core/
hydramem/storage/ ← depends on core/
hydramem/ingest/ ← depends on core/, llm/, storage/
hydramem/verification/ ← depends on core/, llm/
hydramem/garden/ ← depends on core/, llm/, storage/, verification/
hydramem/search.py ← depends on core/, ingest/, storage/, verification/
hydramem/server.py ← depends on all of the above
Sub-package detail¶
hydramem/core/ — Domain primitives (SRP)¶
| Module | Responsibility |
|---|---|
types.py |
Pure dataclasses: Chunk, Entity, Relation |
config.py |
Config class + YAML/env resolution |
logging.py |
get_logger() factory |
tokens.py |
count_tokens() via tiktoken |
hydramem/llm/ — LLM provider abstraction (OCP + DIP)¶
| Module | Responsibility |
|---|---|
base.py |
LLMProvider Protocol — the DIP boundary |
ollama.py |
OllamaProvider — local inference |
openai.py |
OpenAIProvider — OpenAI API |
anthropic.py |
AnthropicProvider — Anthropic Claude API |
factory.py |
create_provider(), call_llm() — registry + singleton |
Adding a new LLM backend: create one file + add one entry to the registry. Zero other files change.
hydramem/storage/ — Repository pattern (OCP + DIP + ISP)¶
| Module | Responsibility |
|---|---|
base.py |
GraphRepository + VectorRepository Protocols |
graph/networkx_repo.py |
NetworkX in-memory graph (always available) |
graph/ladybug_repo.py |
LadybugDB / Kuzu persistent graph |
vector/lancedb_repo.py |
LanceDB persistent vector index |
vector/memory_repo.py |
In-memory cosine-similarity fallback |
factory.py |
KnowledgeStore facade + create_store() / get_store() |
KnowledgeStore composes one GraphRepository and one VectorRepository. Callers depend only on KnowledgeStore — never on concrete backends (DIP + ISP).
hydramem/ingest/ — Ingestion pipeline (SRP)¶
| Module | Responsibility |
|---|---|
chunker.py |
MarkdownChunker — split text into token-sized pieces |
embedder.py |
EmbeddingService — generate dense vectors |
extractor.py |
EntityExtractor — heuristic named-entity recognition |
pipeline.py |
IngestionPipeline — orchestrate the above (only coordination, no logic) |
hydramem/verification/ — Two-level verification (OCP + LSP)¶
| Module | Responsibility |
|---|---|
base.py |
VerificationStep Protocol + VerificationResult dataclass |
srmkg.py |
SRMKGScorer — topological confidence, no LLM |
vog.py |
VoGVerifier — LLM groundedness check, injected LLMProvider |
pipeline.py |
VerificationPipeline — SR-MKG → VoG with VoG cap |
New verification stages (e.g. neural) implement VerificationStep and slot in without touching callers (OCP).
hydramem/garden/ — Night Gardener (SRP per phase)¶
| Module | Responsibility |
|---|---|
repository.py |
SessionRepository, StatusRepository — JSON persistence only |
inferrer.py |
RelationInferrer — Phase 1: propose candidates via LLM |
pruner.py |
KnowledgePruner — Phase 3: remove stale/isolated elements |
gardener.py |
NightGardener — orchestrate phases 1→2→3, inject all deps |
Data flow: end-to-end query¶
User: "How does the Night Gardener prune stale edges?"
│
▼
[AI client] invokes hydramem-reason skill
│
▼
[MCP] → hydra_search_tool(query=..., project="default")
│
▼
[search.py]
├── embed query → [0.12, -0.34, …] (384 dims, CPU)
├── LanceDB ANN → top-5 chunks
├── extract entities: ["Night Gardener", "LightGNN"]
├── graph expand → 2-hop neighbours → +3 chunks
├── SR-MKG score each relation
└── VoG on 2 borderline relations → GROUNDED (0.89), GROUNDED (0.76)
│
▼
[telemetry] log: tokens_injected=2140, tokens_baseline=18400, saved=88%
│
▼
[AI client] receives verified context → generates grounded answer