Skill Memory Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add a first-class procedural memory lane where skills are event-sourced, projected into the graph, and routed through Memory Checkout without mutating existing factual memory.

Architecture: Eventloom remains the source of truth. New skill.* events project Skill and SkillVersion entities plus lifecycle edges into the existing graph, and Memory Checkout exposes applicable skill guidance in a distinct diagnostics/prompt lane. The first release is deterministic and read-only at checkout time; automatic skill amendment is only represented by explicit events.

Tech Stack: Python 3.11+, Pydantic/Eventloom models, existing extractor registry in src/zaxy/extract.py, Neo4j projection through GraphStore, Memory Checkout in src/zaxy/core.py, MCP tools in src/zaxy/mcp_server.py, pytest/ruff/mypy.

---

File Structure

Task 1: Skill Event Extractors

Files:

Add tests near the other typed extractor tests:

def test_extract_skill_proposed_projects_skill_and_version(event_factory):
    event = event_factory(
        type="skill.proposed",
        actor="agent",
        payload={
            "skill_id": "python-test-first",
            "name": "Python test-first implementation",
            "version": "1",
            "summary": "Write the failing pytest before implementation.",
            "procedure": ["Write focused failing test", "Run pytest", "Implement minimum code"],
            "applicability": ["Python feature work", "bug fixes"],
            "citations": ["eventloom://zaxy-default/events/12#abc"],
        },
    )

    result = extract(event)

    assert {entity.entity_type for entity in result.entities} >= {"skill", "skill_version"}
    skill = next(entity for entity in result.entities if entity.entity_type == "skill")
    version = next(entity for entity in result.entities if entity.entity_type == "skill_version")
    assert skill.name == "skill:python-test-first"
    assert skill.summary == "Python test-first implementation"
    assert version.name == "skill:python-test-first:v1"
    assert version.properties["procedure"] == [
        "Write focused failing test",
        "Run pytest",
        "Implement minimum code",
    ]
    assert any(edge.relation_type == "has_version" for edge in result.edges)

Add a second test:

def test_extract_skill_outcome_records_application_metrics(event_factory):
    event = event_factory(
        type="skill.outcome_recorded",
        actor="agent",
        payload={
            "skill_id": "python-test-first",
            "version": "1",
            "task": "fix retrieval scoring",
            "success_score": 0.95,
            "feedback": "used",
            "evidence": ["pytest tests/test_query.py::test_scoring -q"],
        },
    )

    result = extract(event)

    assert any(entity.entity_type == "skill_outcome" for entity in result.entities)
    assert any(edge.relation_type == "recorded_outcome" for edge in result.edges)

Run:

pytest tests/test_extract.py -q --no-cov -k skill_

Expected: both tests fail because the extractors are not registered and the generic fallback only creates an event entity.

Add helpers in src/zaxy/extract.py:

def _skill_id(payload: dict[str, Any], event_seq: int) -> str:
    return _required_text(payload.get("skill_id"), field="skill_id", event_seq=event_seq)


def _skill_version(payload: dict[str, Any]) -> str:
    value = payload.get("version")
    return str(value).strip() if value is not None and str(value).strip() else "1"


def _skill_entity(skill_id: str, event: Event, *, summary: str | None = None) -> ExtractedEntity:
    return ExtractedEntity(
        name=f"skill:{skill_id}",
        entity_type="skill",
        observed_at=event.timestamp,
        summary=summary or skill_id,
        properties={"skill_id": skill_id},
    )

Register skill.proposed:

@register("skill.proposed")
def _extract_skill_proposed(event: Event) -> ExtractionResult:
    skill_id = _skill_id(event.payload, event.seq)
    version = _skill_version(event.payload)
    skill = _skill_entity(skill_id, event, summary=_optional_text_value(event.payload.get("name")))
    version_entity = ExtractedEntity(
        name=f"skill:{skill_id}:v{version}",
        entity_type="skill_version",
        observed_at=event.timestamp,
        summary=_optional_text_value(event.payload.get("summary")),
        properties={
            "skill_id": skill_id,
            "version": version,
            "procedure": _string_list(event.payload.get("procedure")),
            "applicability": _string_list(event.payload.get("applicability")),
            "citations": _string_list(event.payload.get("citations")),
            "status": "proposed",
        },
    )
    return ExtractionResult(
        entities=[skill, version_entity],
        edges=[
            ExtractedEdge(
                source=skill.name,
                target=version_entity.name,
                relation_type="has_version",
                valid_from=event.timestamp,
            )
        ],
        source_event_seq=event.seq,
    )

Register skill.outcome_recorded similarly, with a skill_outcome entity and recorded_outcome edge.

Run:

pytest tests/test_extract.py -q --no-cov -k skill_

Expected: skill extractor tests pass.

git add src/zaxy/extract.py tests/test_extract.py
git commit -m "feat: project skill memory events"

Task 2: Memory Checkout Skill Lane

Files:

Add:

def test_memory_checkout_surfaces_applicable_skills() -> None:
    assembly = ContextAssembly(
        session_id="agent-1",
        prompt="# Context",
        contexts=[
            Context(
                content="Skill Python test-first implementation applies to Python feature work.",
                source="graph",
                score=0.95,
                valid_from="2026-05-17T00:00:00Z",
                valid_to=None,
                metadata={
                    "entity_name": "skill:python-test-first:v1",
                    "entity_type": "skill_version",
                    "citation": "eventloom://agent-1/events/4#abcd",
                    "skill_id": "python-test-first",
                    "procedure": ["Write failing test", "Run pytest", "Implement minimum code"],
                    "applicability": ["Python feature work"],
                    "status": "validated",
                },
            )
        ],
        replay_event_count=0,
        context_counts={"graph": 1, "verbatim": 0, "packet_memory": 0, "replay": 0},
    )

    checkout = build_memory_checkout(query="implement a Python feature", assembly=assembly)

    assert checkout.diagnostics["skills"]["count"] == 1
    assert checkout.diagnostics["skills"]["items"][0]["skill_id"] == "python-test-first"
    assert "## Applicable Skills" in checkout.prompt
    assert "Write failing test" in checkout.prompt

Run:

pytest tests/test_core.py -q --no-cov -k applicable_skills

Expected: FAIL with missing skills diagnostics or missing prompt section.

Add a helper in src/zaxy/core.py:

def _checkout_skills(contexts: list[Context], query: str, *, limit: int = 3) -> list[dict[str, Any]]:
    query_tokens = _checkout_tokens(query)
    skills: list[dict[str, Any]] = []
    for context in contexts:
        metadata = context.metadata or {}
        if metadata.get("entity_type") != "skill_version":
            continue
        applicability = metadata.get("applicability")
        applicability_text = " ".join(applicability) if isinstance(applicability, list) else ""
        overlap = len(query_tokens & _checkout_tokens(f"{context.content} {applicability_text}"))
        if overlap <= 0:
            continue
        procedure = metadata.get("procedure")
        skills.append(
            {
                "skill_id": metadata.get("skill_id") or metadata.get("entity_name"),
                "name": metadata.get("entity_name"),
                "score": context.score,
                "citation": _context_citation(context),
                "procedure": procedure if isinstance(procedure, list) else [],
                "applicability": applicability if isinstance(applicability, list) else [],
                "status": metadata.get("status"),
            }
        )
    return sorted(skills, key=lambda row: float(row["score"]), reverse=True)[:limit]

Call it inside build_memory_checkout, then add diagnostics = {**diagnostics, "skills": {"count": len(skills), "items": skills}} when skills exist. Extend format_memory_checkout_prompt to render:

## Applicable Skills
- skill_id=python-test-first status=validated citation=eventloom://... procedure: 1. Write failing test 2. Run pytest 3. Implement minimum code

Run:

pytest tests/test_core.py -q --no-cov -k applicable_skills

Expected: PASS.

Run:

pytest tests/test_core.py -q --no-cov -k checkout_memory

Expected: existing checkout behavior still passes.

git add src/zaxy/core.py tests/test_core.py
git commit -m "feat: route skill memory through checkout"

Task 3: MCP Skill Helper

Files:

Add:

async def test_memory_skill_event_appends_and_projects(mcp_server):
    graph = AsyncMock()
    mcp_server.graph = graph

    response = await mcp_server.handle_memory_skill(
        {
            "action": "proposed",
            "skill_id": "python-test-first",
            "name": "Python test-first implementation",
            "procedure": ["Write failing test", "Run pytest"],
            "session_id": "agent-1",
        }
    )

    payload = json.loads(response[0].text)
    assert payload["event_type"] == "skill.proposed"
    graph.upsert_extraction.assert_awaited_once()

Run:

pytest tests/test_mcp.py -q --no-cov -k memory_skill

Expected: FAIL because handle_memory_skill does not exist.

Add tool metadata near memory_feedback:

Tool(
    name="memory_skill",
    description="Append a skill lifecycle event and project it into memory.",
    inputSchema={
        "type": "object",
        "properties": {
            "action": {"type": "string", "enum": ["proposed", "validated", "applied", "outcome_recorded", "revised", "deprecated", "contradicted"]},
            "skill_id": {"type": "string"},
            "name": {"type": "string"},
            "procedure": {"type": "array", "items": {"type": "string"}},
            "session_id": {"type": "string"},
        },
        "required": ["action", "skill_id"],
    },
)

Add handler:

async def handle_memory_skill(self, arguments: dict[str, Any]) -> list[TextContent]:
    action = _required_text(arguments.get("action"), "action")
    if action not in {"proposed", "validated", "applied", "outcome_recorded", "revised", "deprecated", "contradicted"}:
        raise ValueError("unsupported skill action")
    session_id = self._session_id_from_arguments(arguments, default=self._default_session_id)
    payload = validate_payload({key: value for key, value in arguments.items() if key not in {"action", "session_id"}})
    eventlog = self.session_manager.get(session_id).eventlog
    event_type = f"skill.{action}"
    event = eventlog.append(event_type, actor="zaxy", payload=payload, thread=session_id)
    await self.graph.upsert_extraction(extract(event), session_id=session_id)
    return [TextContent(type="text", text=json.dumps({"seq": event.seq, "hash": event.hash, "event_type": event_type}))]

Route it in the central MCP call dispatcher.

Run:

pytest tests/test_mcp.py -q --no-cov -k memory_skill

Expected: PASS.

git add src/zaxy/mcp_server.py tests/test_mcp.py
git commit -m "feat: add memory skill mcp tool"

Task 4: Documentation And Guardrails

Files:

Add to docs/agent-events.md:

## Skill Memory

Use `skill.proposed` when a reusable procedure is identified. Use
`skill.validated` only after explicit evaluation. Use `skill.applied` and
`skill.outcome_recorded` to connect a procedure to a task and outcome. Use
`skill.revised`, `skill.deprecated`, and `skill.contradicted` to preserve
version history without rewriting earlier events.

Add to docs/mcp.md:

`memory_skill(action, skill_id, ...)` appends skill lifecycle events. Memory
Checkout may return an `Applicable Skills` section when graph retrieval finds
validated skill versions relevant to the current task.

Add to docs/benchmarks.md:

Skill Memory changes must pass the full 500-question no-regression guardrail:
mean score >= 0.626, Answer@5 >= 0.608, R@5 >= 0.956, citation coverage = 1.000.

Run:

pytest tests/test_docs_site.py tests/test_packaging.py -q --no-cov

Expected: PASS.

Run:

python -m zaxy benchmark-compare reports/benchmarks/longmemeval-500-hash/live-benchmark.json \
  --backend zaxy-checkout \
  --min-mean-score 0.626 \
  --min-answer-recall-at-5 0.608 \
  --min-recall-at-5 0.956 \
  --min-citation-coverage 1.0 \
  --max-p95-ms 15000 \
  --max-p99-ms 23000

Expected: PASS.

git add docs/agent-events.md docs/mcp.md docs/benchmarks.md
git commit -m "docs: document skill memory lifecycle"

Task 5: Final Verification

Files:

Run:

pytest tests/test_extract.py tests/test_core.py tests/test_mcp.py tests/test_docs_site.py tests/test_packaging.py -q --no-cov

Expected: PASS.

Run:

ruff check src/zaxy/extract.py src/zaxy/core.py src/zaxy/mcp_server.py tests/test_extract.py tests/test_core.py tests/test_mcp.py
mypy src/zaxy/extract.py src/zaxy/core.py src/zaxy/mcp_server.py

Expected: ruff clean and mypy success.

Run:

python -m zaxy benchmark-compare reports/benchmarks/longmemeval-500-hash/live-benchmark.json \
  --backend zaxy-checkout \
  --min-mean-score 0.626 \
  --min-answer-recall-at-5 0.608 \
  --min-recall-at-5 0.956 \
  --min-citation-coverage 1.0 \
  --max-p95-ms 15000 \
  --max-p99-ms 23000

Expected: PASS.

If verification required doc fixes, commit them:

git add .
git commit -m "chore: finalize skill memory rollout"

Expected: no uncommitted changes remain after the final commit.

Self-Review