Memory serves two distinct purposes that require different retrieval patterns. Both live in a single ChromaDB collection but use different access modes.
Standing facts that should always be findable: user preferences, project constraints, stated goals. Gemma writes with an explicit topic key; retrieval is exact metadata match, not similarity. Topic namespace uses dot-notation prefixes:
user.* — preferences, style, personal factsproject.* — decisions, deadlines, stackconstraint.* — things to avoid or enforceQ+A history auto-ingested by MemoryAgent on every response.generation. Retrieved by embedding similarity when prior context is relevant. Feedback (Phase 4) attaches as metadata on these engrams — high-reward Q+A pairs surface first when similar questions arise.
Feedback is NOT a topic key. It travels with the episodic engram it was attached to. Similarity search naturally surfaces well-rated prior answers.
Single tool, two retrieval paths: if Gemma passes topic, exact metadata lookup; if Gemma passes query, similarity search. Gemma chooses the right mode based on what it needs.
recall_memory(topic="user.language_preference") → exact lookup recall_memory(query="what databases has this user worked with") → similarity
Phase 2 gives Gemma persistent memory. Phase 2 is complete when Stories S5 and S6 pass against the live stack.
src/local/services/memory_service.py newsrc/local/tools/memory_recall_tool.py newsrc/local/tools/memory_save_tool.py newsrc/local/agents/memory_agent.py newconfig/memory.yaml newtests/test_memory_service.py newtests/test_memory_recall_tool.py newtests/test_memory_save_tool.py newtests/stories/s5_memory_recall.yaml newtests/stories/s6_memory_save.yaml newrun_local.py — start MemoryRecallTool, MemorySaveTool, MemoryAgent modrequirements.txt — add chromadb modsrc/local/protocol/subjects.py — add tool subjects for recall + save modsrc/local/agents/generator_agent.py — add query field to response.generation payload modollama pull nomic-embed-textpip install chromadb.chroma/ to .gitignoreTools announce schemas dynamically via tool.schema. generator.yaml stays tools: [].
Steps 9–13 were implemented with 2 tools (save_memory dual-mode, recall_memory dual-mode). Live testing showed Gemma reliably ignores save_memory on "please remember" requests — it treats them as conversational. Root cause: dual-mode tools require Gemma to make a secondary decision inside the call. Single-purpose tools make tool selection itself the only decision.
Tools are generic bus-native — any agent publishes tool.request.*, any agent receives tool.result.*. The JSON schema is Gemma's view only. CriticAgent, ReflectionAgent, or future agents can call these tools directly via the bus.
| Tool | MemoryService method | Schema trigger (when to call) |
|---|---|---|
| save_topic | write_topic | User asks to remember a preference, rule, or fact permanently across sessions |
| recall_topic | recall_topic | Look up a specific standing fact by key the agent knows or can infer |
| save_memory | write_episodic | Explicitly note something important mid-conversation (agent-initiated) |
| search_memory | search_episodic | Recall something from past sessions when no exact key is known — search by meaning |
memory_recall_tool.py + memory_save_tool.py with 4 new tool filessubjects.py: tool.request/result.save_topic, recall_topic, save_memory, search_memoryrun_local.py — 4 tool threads, shared MemoryService instanceChromaDB + nomic-embed-text wrapper. Methods: write_topic(topic, content), recall_topic(topic), write_episodic(content, metadata), search_episodic(query, top_k). Unit test: write a topic entry, recall it exactly; write an episodic entry, search and verify similarity score above threshold.
Subscribes tool.request.memory_recall. If args.topic present → exact lookup; if args.query present → similarity search. Formats results, publishes tool.result.memory_recall. Announces schema on startup.
Subscribes tool.request.memory_save. Reads args.topic (required) and args.content. Calls MemoryService.write_topic(). Publishes confirmation. Announces schema on startup.
Subscribes response.generation. Reads query + answer from payload (requires GeneratorAgent to add query field to that envelope — one-line change). Calls MemoryService.write_episodic(). Fire-and-forget, no bus output.
Start all three participants in run_local.py. Stories S5 (recall surfaces prior context via similarity) and S6 (save + recall a topic key) pass. Phase 2 complete.
Always inject prefix in MemoryService — callers pass raw text:
search_document: prefix on writesearch_query: prefix on searchWithout these prefixes, retrieval quality drops significantly.
chroma_path: ".chroma" collection: "local2.memory" embedding_model: "nomic-embed-text" top_k: 5 min_score: 0.3 # cosine similarity threshold
class MemoryService:
# Topic store — exact key, no embedding needed for retrieval
def write_topic(self, topic: str, content: str) -> str:
# embed with search_document: prefix
# upsert with metadata={type: "topic", topic: topic}
# returns entry_id
def recall_topic(self, topic: str) -> str | None:
# ChromaDB where filter: {type: "topic", topic: topic}
# returns content string or None
# Episodic store — similarity search
def write_episodic(self, content: str, metadata: dict) -> str:
# embed with search_document: prefix
# add with metadata={type: "episodic", ...metadata}
# returns entry_id
def search_episodic(self, query: str, top_k: int | None = None) -> list[dict]:
# embed with search_query: prefix
# query ChromaDB filtering type="episodic"
# returns [{id, content, score, metadata}] above min_score
args = payload["args"]
if "topic" in args:
result = memory_service.recall_topic(args["topic"])
# exact match — no embedding needed
elif "query" in args:
hits = memory_service.search_episodic(args["query"])
# similarity search over episodic store
Topic recall:
[Memory: user.language_preference] Elixir
Episodic recall:
[Memory recall — 2026-05-31] 1. (relevance: 0.91) 2026-05-30 Q: What is the capital of France? A: Paris.
SCHEMA = {
"type": "function",
"function": {
"name": "memory_recall",
"description": (
"Retrieve from memory. Two modes: "
"(1) topic lookup — pass topic key to retrieve a stored preference, "
"constraint, or project fact (e.g. topic='user.language_preference'); "
"(2) similarity search — pass query string to search prior Q&A history "
"for relevant past context. Use topic lookup when you know the key; "
"use query search when looking for prior conversation context."
),
"parameters": {
"type": "object",
"properties": {
"topic": {"type": "string",
"description": "Topic key for exact lookup (e.g. 'user.name', 'project.deadline')."},
"query": {"type": "string",
"description": "Free-text query for similarity search over past Q&A history."}
}
# one of topic or query required — validated at runtime
}
}
}
user.*, project.*, constraint.*MemoryService.write_topic() — upserts (overwrites previous value for same key)SCHEMA = {
"type": "function",
"function": {
"name": "memory_save",
"description": (
"Save a preference, constraint, or project fact to memory under a "
"topic key for exact retrieval later. Use topic prefixes: "
"user.* for user preferences and personal facts, "
"project.* for project decisions and metadata, "
"constraint.* for things to avoid or enforce. "
"Example: memory_save(topic='user.name', content='Richard'). "
"Overwrites any previous value for the same topic key."
),
"parameters": {
"type": "object",
"properties": {
"topic": {"type": "string",
"description": "Topic key (e.g. 'user.name', 'project.deadline')."},
"content": {"type": "string",
"description": "The value to store."}
},
"required": ["topic", "content"]
}
}
}
response.generationpayload.query + payload.answerMemoryService.write_episodic() with combined Q+A contentsession_id, query_id, tool_calls count, timestamp_utcscore in this metadataAdd query to response.generation payload in _handle_query():
self._pub.publish(self._make_envelope(
RESPONSE_GENERATION, "response",
{"answer": answer, "thinking": thinking,
"tool_calls": tool_call_log,
"query": query, # ← add this
"session_id": session_id,
"query_id": query_id},
...
))
MemoryAgent subscribes one subject only. No need to correlate across query.received + response.generation.
def _start_memory_recall() -> None:
from local.tools.memory_recall_tool import MemoryRecallTool
MemoryRecallTool().run()
def _start_memory_save() -> None:
from local.tools.memory_save_tool import MemorySaveTool
MemorySaveTool().run()
def _start_memory_agent() -> None:
from local.agents.memory_agent import MemoryAgent
MemoryAgent().run()
# In main(), after existing tools settle:
threading.Thread(target=_start_memory_recall, daemon=True).start()
threading.Thread(target=_start_memory_save, daemon=True).start()
threading.Thread(target=_start_memory_agent, daemon=True).start()
time.sleep(0.2)
| Subject | Publisher | Subscriber | Notes |
|---|---|---|---|
| tool.request.memory_recall | GeneratorAgent | MemoryRecallTool | Args: {topic} or {query} |
| tool.result.memory_recall | MemoryRecallTool | GeneratorAgent (short-lived sub) | Formatted memory text back to Gemma |
| tool.request.memory_save | GeneratorAgent | MemorySaveTool | Args: {topic, content} |
| tool.result.memory_save | MemorySaveTool | GeneratorAgent (short-lived sub) | "Memory saved: {topic}" |
| response.generation | GeneratorAgent | MemoryAgent (new subscriber) + existing | Must include query field from Step 12 |
Setup: Run one conversation turn — "The capital of Australia is Canberra, not Sydney." MemoryAgent auto-ingests it.
New session: Ask "What do you remember about Australian geography?" — Gemma should call memory_recall(query=...) and surface the prior turn.
Assert: memory_recall in tool_calls, answer mentions Canberra.
Turn 1: "Please remember my preferred language is Elixir." — Gemma calls memory_save(topic="user.language_preference", content="Elixir").
Assert: memory_save in tool_calls, confirmation in answer.
Turn 2: "What language do I prefer?" — Gemma calls memory_recall(topic="user.language_preference").
Assert: answer contains "Elixir".
search_document: / search_query: prefixes, retrieval quality drops significantly. Always inject in MemoryService — callers pass raw text.where={"topic": topic, "type": "topic"}. Exact match only. This is intentional — preferences must be reliably findable..chroma/ to .gitignore before first run. Never commit the vector store.memory_recall returns "No memories found." — handle gracefully, not as an error.query to GeneratorAgent's response.generation payload in Step 12 before implementing MemoryAgent, or MemoryAgent will write engrams with no query text.