← Back to main plan

LoCAL2 — Phase 2

Memory as Tool: MemoryRecallTool · MemorySaveTool · MemoryAgent

Architecture: Two Memory Stores, One Collection

Memory serves two distinct purposes that require different retrieval patterns. Both live in a single ChromaDB collection but use different access modes.

Topic store — exact key lookup

Standing facts that should always be findable: user preferences, project constraints, stated goals. Gemma writes with an explicit topic key; retrieval is exact metadata match, not similarity. Topic namespace uses dot-notation prefixes:

  • user.* — preferences, style, personal facts
    e.g. user.language_preference, user.name
  • project.* — decisions, deadlines, stack
    e.g. project.deadline, project.tech_stack
  • constraint.* — things to avoid or enforce
    e.g. constraint.no_external_apis

Episodic store — similarity search

Q+A history auto-ingested by MemoryAgent on every response.generation. Retrieved by embedding similarity when prior context is relevant. Feedback (Phase 4) attaches as metadata on these engrams — high-reward Q+A pairs surface first when similar questions arise.

Feedback is NOT a topic key. It travels with the episodic engram it was attached to. Similarity search naturally surfaces well-rated prior answers.

MemoryRecallTool — dual mode dispatch

Single tool, two retrieval paths: if Gemma passes topic, exact metadata lookup; if Gemma passes query, similarity search. Gemma chooses the right mode based on what it needs.

recall_memory(topic="user.language_preference")  → exact lookup
recall_memory(query="what databases has this user worked with")  → similarity

Scope & Deliverables

Phase 2 gives Gemma persistent memory. Phase 2 is complete when Stories S5 and S6 pass against the live stack.

New files

  • src/local/services/memory_service.py new
  • src/local/tools/memory_recall_tool.py new
  • src/local/tools/memory_save_tool.py new
  • src/local/agents/memory_agent.py new
  • config/memory.yaml new
  • tests/test_memory_service.py new
  • tests/test_memory_recall_tool.py new
  • tests/test_memory_save_tool.py new
  • tests/stories/s5_memory_recall.yaml new
  • tests/stories/s6_memory_save.yaml new

Modified files

  • run_local.py — start MemoryRecallTool, MemorySaveTool, MemoryAgent mod
  • requirements.txt — add chromadb mod
  • src/local/protocol/subjects.py — add tool subjects for recall + save mod
  • src/local/agents/generator_agent.py — add query field to response.generation payload mod

Prerequisites

  • ollama pull nomic-embed-text
  • pip install chromadb
  • Add .chroma/ to .gitignore

No generator.yaml changes

Tools announce schemas dynamically via tool.schema. generator.yaml stays tools: [].

Design Revision — 4 Single-Purpose Tools (pending)

Steps 9–13 were implemented with 2 tools (save_memory dual-mode, recall_memory dual-mode). Live testing showed Gemma reliably ignores save_memory on "please remember" requests — it treats them as conversational. Root cause: dual-mode tools require Gemma to make a secondary decision inside the call. Single-purpose tools make tool selection itself the only decision.

Tools are generic bus-native — any agent publishes tool.request.*, any agent receives tool.result.*. The JSON schema is Gemma's view only. CriticAgent, ReflectionAgent, or future agents can call these tools directly via the bus.

ToolMemoryService methodSchema trigger (when to call)
save_topicwrite_topicUser asks to remember a preference, rule, or fact permanently across sessions
recall_topicrecall_topicLook up a specific standing fact by key the agent knows or can infer
save_memorywrite_episodicExplicitly note something important mid-conversation (agent-initiated)
search_memorysearch_episodicRecall something from past sessions when no exact key is known — search by meaning

Refactor scope

Build Order — Steps 9–13

9

MemoryService — data layer

ChromaDB + nomic-embed-text wrapper. Methods: write_topic(topic, content), recall_topic(topic), write_episodic(content, metadata), search_episodic(query, top_k). Unit test: write a topic entry, recall it exactly; write an episodic entry, search and verify similarity score above threshold.

10

MemoryRecallTool

Subscribes tool.request.memory_recall. If args.topic present → exact lookup; if args.query present → similarity search. Formats results, publishes tool.result.memory_recall. Announces schema on startup.

11

MemorySaveTool

Subscribes tool.request.memory_save. Reads args.topic (required) and args.content. Calls MemoryService.write_topic(). Publishes confirmation. Announces schema on startup.

12

MemoryAgent — auto-ingestion

Subscribes response.generation. Reads query + answer from payload (requires GeneratorAgent to add query field to that envelope — one-line change). Calls MemoryService.write_episodic(). Fire-and-forget, no bus output.

13

Wire-up + Stories S5 + S6 ✓ Phase 2

Start all three participants in run_local.py. Stories S5 (recall surfaces prior context via similarity) and S6 (save + recall a topic key) pass. Phase 2 complete.

Step 9 — MemoryService

nomic-embed-text prefix convention

Always inject prefix in MemoryService — callers pass raw text:

  • search_document: prefix on write
  • search_query: prefix on search

Without these prefixes, retrieval quality drops significantly.

config/memory.yaml

chroma_path: ".chroma"
collection: "local2.memory"
embedding_model: "nomic-embed-text"
top_k: 5
min_score: 0.3      # cosine similarity threshold

MemoryService API

class MemoryService:
    # Topic store — exact key, no embedding needed for retrieval
    def write_topic(self, topic: str, content: str) -> str:
        # embed with search_document: prefix
        # upsert with metadata={type: "topic", topic: topic}
        # returns entry_id

    def recall_topic(self, topic: str) -> str | None:
        # ChromaDB where filter: {type: "topic", topic: topic}
        # returns content string or None

    # Episodic store — similarity search
    def write_episodic(self, content: str, metadata: dict) -> str:
        # embed with search_document: prefix
        # add with metadata={type: "episodic", ...metadata}
        # returns entry_id

    def search_episodic(self, query: str, top_k: int | None = None) -> list[dict]:
        # embed with search_query: prefix
        # query ChromaDB filtering type="episodic"
        # returns [{id, content, score, metadata}] above min_score

Step 10 — MemoryRecallTool

Dual-mode dispatch

args = payload["args"]
if "topic" in args:
    result = memory_service.recall_topic(args["topic"])
    # exact match — no embedding needed
elif "query" in args:
    hits = memory_service.search_episodic(args["query"])
    # similarity search over episodic store

Output formats

Topic recall:

[Memory: user.language_preference]
Elixir

Episodic recall:

[Memory recall — 2026-05-31]

1. (relevance: 0.91) 2026-05-30
   Q: What is the capital of France?
   A: Paris.

Tool schema

SCHEMA = {
  "type": "function",
  "function": {
    "name": "memory_recall",
    "description": (
      "Retrieve from memory. Two modes: "
      "(1) topic lookup — pass topic key to retrieve a stored preference, "
      "constraint, or project fact (e.g. topic='user.language_preference'); "
      "(2) similarity search — pass query string to search prior Q&A history "
      "for relevant past context. Use topic lookup when you know the key; "
      "use query search when looking for prior conversation context."
    ),
    "parameters": {
      "type": "object",
      "properties": {
        "topic": {"type": "string",
                  "description": "Topic key for exact lookup (e.g. 'user.name', 'project.deadline')."},
        "query": {"type": "string",
                  "description": "Free-text query for similarity search over past Q&A history."}
      }
      # one of topic or query required — validated at runtime
    }
  }
}

Step 11 — MemorySaveTool

Behaviour

  • Topic key is required — no free-form saves via this tool
  • Topic must use one of: user.*, project.*, constraint.*
  • Calls MemoryService.write_topic() — upserts (overwrites previous value for same key)
  • Returns confirmation string to Gemma

Tool schema

SCHEMA = {
  "type": "function",
  "function": {
    "name": "memory_save",
    "description": (
      "Save a preference, constraint, or project fact to memory under a "
      "topic key for exact retrieval later. Use topic prefixes: "
      "user.* for user preferences and personal facts, "
      "project.* for project decisions and metadata, "
      "constraint.* for things to avoid or enforce. "
      "Example: memory_save(topic='user.name', content='Richard'). "
      "Overwrites any previous value for the same topic key."
    ),
    "parameters": {
      "type": "object",
      "properties": {
        "topic": {"type": "string",
                  "description": "Topic key (e.g. 'user.name', 'project.deadline')."},
        "content": {"type": "string",
                    "description": "The value to store."}
      },
      "required": ["topic", "content"]
    }
  }
}

Step 12 — MemoryAgent

Behaviour

  • Subscribes response.generation
  • Reads payload.query + payload.answer
  • Calls MemoryService.write_episodic() with combined Q+A content
  • Metadata: session_id, query_id, tool_calls count, timestamp_utc
  • Phase 4: feedback updates score in this metadata
  • Fire-and-forget — errors logged, never crash

GeneratorAgent change (prerequisite)

Add query to response.generation payload in _handle_query():

self._pub.publish(self._make_envelope(
    RESPONSE_GENERATION, "response",
    {"answer": answer, "thinking": thinking,
     "tool_calls": tool_call_log,
     "query": query,          # ← add this
     "session_id": session_id,
     "query_id": query_id},
    ...
))

MemoryAgent subscribes one subject only. No need to correlate across query.received + response.generation.

Step 13 — Wire-up

def _start_memory_recall() -> None:
    from local.tools.memory_recall_tool import MemoryRecallTool
    MemoryRecallTool().run()

def _start_memory_save() -> None:
    from local.tools.memory_save_tool import MemorySaveTool
    MemorySaveTool().run()

def _start_memory_agent() -> None:
    from local.agents.memory_agent import MemoryAgent
    MemoryAgent().run()

# In main(), after existing tools settle:
threading.Thread(target=_start_memory_recall, daemon=True).start()
threading.Thread(target=_start_memory_save, daemon=True).start()
threading.Thread(target=_start_memory_agent, daemon=True).start()
time.sleep(0.2)

Bus Subjects — Phase 2 additions

SubjectPublisherSubscriberNotes
tool.request.memory_recall GeneratorAgent MemoryRecallTool Args: {topic} or {query}
tool.result.memory_recall MemoryRecallTool GeneratorAgent (short-lived sub) Formatted memory text back to Gemma
tool.request.memory_save GeneratorAgent MemorySaveTool Args: {topic, content}
tool.result.memory_save MemorySaveTool GeneratorAgent (short-lived sub) "Memory saved: {topic}"
response.generation GeneratorAgent MemoryAgent (new subscriber) + existing Must include query field from Step 12

Acceptance Stories

S5 — episodic recall across sessions

Setup: Run one conversation turn — "The capital of Australia is Canberra, not Sydney." MemoryAgent auto-ingests it.
New session: Ask "What do you remember about Australian geography?" — Gemma should call memory_recall(query=...) and surface the prior turn.
Assert: memory_recall in tool_calls, answer mentions Canberra.

S6 — topic save and recall

Turn 1: "Please remember my preferred language is Elixir." — Gemma calls memory_save(topic="user.language_preference", content="Elixir").
Assert: memory_save in tool_calls, confirmation in answer.
Turn 2: "What language do I prefer?" — Gemma calls memory_recall(topic="user.language_preference").
Assert: answer contains "Elixir".

Gotchas

nomic-embed-text prefix is mandatory
Without search_document: / search_query: prefixes, retrieval quality drops significantly. Always inject in MemoryService — callers pass raw text.
Topic recall does NOT use embeddings for retrieval
Topic entries are stored with embeddings (for potential future hybrid search) but retrieved via ChromaDB metadata filter: where={"topic": topic, "type": "topic"}. Exact match only. This is intentional — preferences must be reliably findable.
.chroma/ must be gitignored
Add .chroma/ to .gitignore before first run. Never commit the vector store.
Cold start: no memories on fresh install
On first run, both recall modes return empty results. memory_recall returns "No memories found." — handle gracefully, not as an error.
query field must be in response.generation before starting MemoryAgent
Add query to GeneratorAgent's response.generation payload in Step 12 before implementing MemoryAgent, or MemoryAgent will write engrams with no query text.