Memory Scoring, Embeddings, and Situation Reports
Pi's memory system doesn't just store facts — it forgets them. Like biological memory, entries that aren't reinforced fade over time, while frequently referenced knowledge becomes stronger. This means the context injected into every conversation is always relevant, concise, and current — without any manual curation.
This page explains the internal mechanics: how scores are calculated, how vector embeddings power semantic recall, how topic files organize knowledge, and how the situation report compresses it all into a token budget for every system prompt.
Note: For hands-on usage — adding memories, searching, dreaming, and capacity management — see Working with Project Memory. This page covers the engine underneath.
Architecture Overview
The memory system is a four-layer pipeline. Each layer has a distinct job:
| Layer | Module | Responsibility |
|---|---|---|
| 1. Scored Memory | memory-scoring.ts |
Stability formula, decay, evidence counting, lifecycle states |
| 2. Topic Tree | memory-tree.ts |
Markdown topic files — the source of truth for all entries |
| 3. Vector Embeddings | memory-embeddings.ts |
Semantic search via in-process ONNX model (bge-small-en-v1.5) |
| 4. Situation Report | situation-report.ts |
Token-budgeted, priority-ordered context block for system prompts |
These layers feed into an auto-injection pipeline (rules.ts) that runs on every turn — inserting the right memories at the right time without any user action.
Data Flow: From Entry to System Prompt
- A memory is added (via
memory_addtool, preference auto-extraction, or dreaming) - The entry is written to the appropriate topic file under
.pi/memory/topics/ - A stability score is computed and stored in
memory-scores.json - The entry is embedded as a 384-dim vector and stored in
embeddings.json - On each turn,
before_agent_startfires: - The situation report selects scored entries within a token budget
- A vector search finds contextually relevant memories for the current message
-
Past session summaries are keyword-searched for additional context
-
All three blocks are prepended to the system prompt
Layer 1: Stability-Based Scoring
Every memory entry has a stability score that decays exponentially over time. The score determines whether an entry stays active, gets demoted, or is dropped entirely.
The Stability Formula
stability = cue_weight × exp(-Δt / half_life) × ln(1 + evidence_count)
| Component | What It Measures | Effect |
|---|---|---|
cue_weight |
How the memory was produced | Explicit (user-stated) memories start stronger |
exp(-Δt / half_life) |
Time since last reinforcement | Exponential decay — old, unreinforced memories fade |
ln(1 + evidence_count) |
How many times it's been reinforced | Logarithmic boost — first few reinforcements matter most |
Two special cases override the formula:
- Pinned entries → score is always
9999(never decay) - Forgotten entries → score is always
0(always dropped)
Cue Weights
Not all evidence is equal. The source of the memory affects its initial strength:
| Cue Type | Weight | When Used |
|---|---|---|
explicit |
1.0 | User explicitly said "remember this", or added via memory_add |
structural |
0.9 | Derived from project structure or configuration |
behavioral |
0.7 | Observed from user behavior patterns (auto-extracted) |
recurrence |
0.6 | Detected from recurring patterns across sessions |
Decay Half-Lives
Each memory category decays at a different rate, reflecting how long that type of knowledge typically stays relevant:
| Category | Half-Life | Rationale |
|---|---|---|
preference |
90 days | Personal preferences change slowly |
lesson |
60 days | Learned knowledge stays relevant for weeks |
pattern |
30 days | Code patterns can shift with refactors |
decision |
30 days | Architectural decisions may be revisited |
done |
14 days | Completed work becomes irrelevant quickly |
mistake |
14 days | Mistakes are worth remembering briefly, then usually resolved |
Tip: Calling
memory_reinforceon an entry resets its decay clock and bumps the evidence count. This is the primary mechanism for keeping important memories alive. The orchestrator is instructed to do this automatically whenever it notices a memory is relevant to the current task.
Lifecycle States
Based on its stability score, every entry is assigned a lifecycle state:
| State | Score Threshold | Meaning |
|---|---|---|
active |
≥ 1.5 | Included in the situation report |
provisional |
≥ 0.7 | Overflow — included if budget allows |
candidate |
≥ 0.4 | At risk of being dropped |
dropped |
< 0.4 | Not injected, may be archived |
The lifecycle state is recalculated during rebuild cycles which happen:
- On every
session_start - Every 30 minutes (cheap, no LLM call — just rescores existing entries)
- After dreaming completes
Budget Caps
To prevent any single category from dominating the context window, per-category budget caps limit how many entries can be active:
| Category | Max Active Entries |
|---|---|
preference |
8 |
lesson |
8 |
pattern |
6 |
decision |
4 |
done |
4 |
mistake |
4 |
Entries that exceed their category budget are demoted to provisional. A cross-category overflow pool of 6 slots holds the highest-scoring provisional entries. The total cap across all categories is 40 active entries.
Layer 2: Topic Tree Organization
All memory entries live in Markdown files under .pi/memory/topics/. These files are the source of truth — the scoring layer reads from them, not the other way around.
.pi/memory/
├── memory-scores.json # Stability scores (auto-managed)
├── embeddings.json # Vector embeddings (auto-managed)
└── topics/
├── preferences.md # [preference] entries
├── lessons.md # [lesson] entries
├── patterns.md # [pattern] entries
├── decisions.md # [decision] entries
├── completions.md # [done] entries
└── mistakes.md # [mistake] entries
Each topic file uses a simple format:
# Preferences
- [preference] Always use --admin for gate-blocked PRs *(pinned)*
- [preference] User prefers concise responses
- [preference] Use conventional commits for commit messages
Size Limits
Each topic file is capped at ~12,000 characters (roughly 3,000 tokens). When a topic file would exceed this limit, memory_add refuses the write and instructs the LLM to consolidate or remove entries first.
Topic Hotness
Each topic has a hotness score — the sum of stability scores for all entries in the topic. Hotter topics are prioritized in the situation report. Topics that go cold (no reinforcement for 2× the category's half-life) are automatically archived (deleted), unless they contain pinned entries.
Category-to-Topic Mapping
| Category | Topic File |
|---|---|
preference |
preferences.md |
lesson |
lessons.md |
pattern |
patterns.md |
decision |
decisions.md |
done |
completions.md |
mistake |
mistakes.md |
Layer 3: Vector Embeddings
Vector embeddings enable semantic search — finding memories that are conceptually related to a query even when they don't share any keywords. This powers both the memory_search tool and the automatic contextual injection on every turn.
Model Details
| Property | Value |
|---|---|
| Model | Xenova/bge-small-en-v1.5 |
| Dimensions | 384 |
| Runtime | ONNX via @huggingface/transformers (in-process, no Python) |
| Download size | ~50 MB (first run only) |
| Init time | ~2.7 seconds (first call per session) |
| Search latency | ~2.5 ms per query |
| API keys | None — runs entirely locally |
How It Works
- Embed on write: When
memory_addcreates an entry, it's immediately embedded and the vector is stored in.pi/memory/embeddings.json - Embed on first search: If any existing entries lack embeddings (e.g., after upgrading from a pre-embedding version), the first
memory_searchcall batch-embeds all missing entries - Search: The query is embedded, then cosine similarity is computed against all stored vectors
- Hybrid results: Vector search results are merged with keyword search results, deduplicated, and ranked by a combined score (similarity × 100 + stability score)
Embedding Key Format
Each embedding is keyed by a truncated SHA-256 hash of [category] text — this prevents cross-category collisions where the same text appears in different categories.
Graceful Fallback
If the @huggingface/transformers package is unavailable or the model fails to load, the system falls back to keyword-only search. No errors are thrown — callers always get results.
Note: Embeddings are stored per-project in
.pi/memory/embeddings.json. The model is loaded once per process and cached for the session lifetime.
Layer 4: Situation Reports
The situation report is the final output of the memory system — a structured, priority-ordered Markdown block injected into the system prompt on every turn. It replaces a raw dump of all memories with a curated, token-budgeted summary.
Token Budget
The default budget is 1,700 tokens. The actual budget is dynamically adjusted based on the size of the system prompt (rules, skills, context files):
available_memory_budget = min(1700, 8000 - system_prompt_tokens)
If the system prompt already consumes the entire 8,000-token budget, memory injection is skipped entirely.
Section Priority
The report is built section-by-section in priority order. Higher-priority sections are always included; lower-priority sections are truncated or dropped when the budget runs out:
| Priority | Section | Token Budget | Filter |
|---|---|---|---|
| — | Pinned | Unlimited | All pinned entries (always first) |
| 1 | Active Preferences | 400 | preference entries |
| 2 | Active Lessons | 400 | lesson entries |
| 3 | Vetoes & Mistakes | 200 | mistake entries |
| 4 | Patterns | 200 | pattern entries |
| 5 | Recent Decisions | 200 | decision entries from last 7 days |
| 6 | Recent Completions | 200 | done entries from last 3 days |
Within each section, entries are sorted by stability score (highest first). If a section exceeds its token budget, lower-scored entries are omitted with a count indicator (e.g., "... 3 more (lower priority, omitted for context budget)").
Capacity Signal
The report header displays current usage as a percentage:
# Project Memory [72% — 1,224/1,700 tokens]
When usage exceeds 80%, a consolidation warning is injected:
> ⚠️ Memory above 80% capacity. Before adding new entries, consolidate or
> remove existing ones using memory_remove.
This warning is visible to the LLM, which then knows to consolidate before adding more entries.
Ground Truth Instruction
Every situation report includes a Ground Truth instruction:
Ground Truth: Memories above and contextually relevant memories injected below are authoritative. Use them directly — do not re-discover or re-verify information already in your context window.
This prevents the LLM from wasting tokens re-verifying facts it already has in context.
The Auto-Injection Pipeline
The rules.ts module orchestrates all memory injection through lifecycle hooks. Here's what happens on each turn:
On session_start
- Run
rebuildAndOrganize()— rescore all entries from topic files - Bootstrap vector embeddings — embed any entries missing from the store
- Reset dream state and start timers (rebuild every 30 min, dream every 3 hours)
On before_agent_start (every turn)
This is the main injection point. The system prompt is constructed in this order:
- Situation report — scored, token-budgeted memory summary
- Contextual memory recall — vector search against the user's current message (top 5, similarity > 0.65)
- Session history recall — keyword search against past conversation summaries (top 3)
- Original system prompt — the agent's base instructions
- Orchestrator rules — all
rules/*.mdfiles (orchestrator only) - Async agent status — what background agents are currently running
Note: The social closer gate skips vector and session search for trivial messages ("ok", "thanks", "yes", emoji-only). This avoids wasting computation on messages that don't need contextual recall.
On turn_end
After each response, two things happen:
- Retrieval telemetry — logs whether injected memories were actually referenced in the LLM's response (written to
.pi/data/memory-telemetry.jsonl) - File-change memory reminder — if the LLM modified files during the turn, a vector search runs against the modified file paths. Relevant memories (similarity > 0.70) are surfaced as a follow-up reminder
On input (Preference Auto-Extraction)
The preference extractor (preference-extractor.ts) listens to every user message and pattern-matches for preference signals:
| Pattern | Example |
|---|---|
| "I prefer..." | "I prefer tabs over spaces" |
| "Always use..." | "Always use conventional commits" |
| "Never use..." | "Never use sudo in containers" |
| "From now on..." | "From now on, run tests before committing" |
| "My timezone..." | "My timezone is UTC+2" |
Detected preferences are automatically added to preferences.md with cue: explicit scoring. If the preference already exists, it's reinforced instead. A 1-hour cooldown prevents the same preference from being re-extracted repeatedly.
On session_shutdown
- Accumulated user messages are indexed into the session search store (
.pi/data/session-search.json, max 500 entries) - If auto-dreaming is enabled, a background dream is fired (detached process — survives session exit)
Retrieval Telemetry
Every auto-injection is logged to .pi/data/memory-telemetry.jsonl with:
- Injection events — what memories were injected, the truncated prompt that triggered them
- Usage events — whether the LLM actually referenced injected memories in its response, with a usage rate (
usedCount / injectedCount) - Session injection events — when past session summaries are injected
The file is capped at 500 KB (older entries trimmed to the last 200 lines).
Tip: This telemetry helps you understand which memories are being used and which are being ignored — useful for tuning memory quality over time.
Storage File Reference
| File | Format | Purpose |
|---|---|---|
.pi/memory/topics/*.md |
Markdown | Source of truth — all memory entries |
.pi/memory/memory-scores.json |
JSON | Stability scores, evidence counts, lifecycle states |
.pi/memory/embeddings.json |
JSON | Vector embeddings (384-dim float arrays keyed by SHA-256 hash) |
.pi/data/session-search.json |
JSON | Past conversation summaries for keyword search |
.pi/data/memory-telemetry.jsonl |
JSONL | Retrieval telemetry logs |
.pi/memory/.dream-watermark |
Plaintext | Timestamp of last dream — prevents reprocessing |
Extension Points
If you're modifying the memory system or building on top of it, here are the key functions and hooks:
Scoring Functions (memory-scoring.ts)
| Function | Signature | Purpose |
|---|---|---|
calculateStability() |
(cue, evidenceCount, lastReinforcedAt, nowMs, category, userState) → number |
Core scoring formula |
lifecycleFromScore() |
(score, userState) → LifecycleState |
Map score to lifecycle state |
rebuild() |
(cwd, entries) → RebuildResult |
Full rebuild cycle: score all entries, apply budgets |
reinforce() |
(cwd, entryLine) → boolean |
Bump evidence count for an entry |
getActiveEntries() |
(cwd) → {hash, entry}[] |
Get all active/provisional entries, sorted by score |
entryHash() |
(text) → string |
FNV-1a hash for entry keys |
extractPreferences() |
(text) → string[] |
Pattern-match preference statements from text |
Embedding Functions (memory-embeddings.ts)
| Function | Signature | Purpose |
|---|---|---|
initEmbeddings() |
() → Promise<boolean> |
Initialize the ONNX model (lazy, cached) |
embedEntry() |
(cwd, text, category?) → Promise<void> |
Embed and store a single entry |
removeEmbedding() |
(cwd, text, category?) → void |
Remove an entry's embedding |
vectorSearch() |
(cwd, query, entries, topK?) → Promise<results[]> |
Semantic search by cosine similarity |
embedMissing() |
(cwd, entries) → Promise<number> |
Batch-embed entries without existing embeddings |
Situation Report Functions (situation-report.ts)
| Function | Signature | Purpose |
|---|---|---|
buildSituationReport() |
(cwd, tokenBudget?) → string |
Build the token-budgeted Markdown report |
rebuildAndOrganize() |
(cwd) → void |
Rescore all entries from topic files |
estimateMemoryBudget() |
(systemPromptLength, totalBudget?) → number |
Dynamic budget based on system prompt size |
Lifecycle Hooks
| Hook | When | What the memory system does |
|---|---|---|
session_start |
Session begins | Rebuild scores, bootstrap embeddings, start timers |
before_agent_start |
Every turn | Inject situation report + contextual memories + session history |
turn_end |
After each response | Log retrieval telemetry, file-change memory reminders |
input |
User types a message | Auto-extract preferences |
session_compact |
Context compaction | Index compacted summary for session search |
session_shutdown |
Session ends | Index accumulated messages, trigger background dream |
Related Pages
- Working with Project Memory — hands-on guide to using the memory system: adding, searching, dreaming, and managing capacity
- Extension Architecture and Lifecycle Hooks — how hooks like
before_agent_startandturn_endwork in the extension system - Running Background Agents and Scheduled Tasks — how dreaming runs as an async background agent
- Configuration and Environment Variables Reference —
PI_DREAM_INTERVAL_HOURSand other memory-related settings - Orchestrator Rules Reference — the
35-memory.mdrule that governs LLM memory behavior