What is MemoryLens?
MemoryLens is an open-source observability and debugging tool designed specifically for AI agent memory systems. It addresses a critical gap in modern AI application development: the memory pipeline is opaque. When an agent "forgets" something a user said, retrieves the wrong memories, or silently drops a fact during compression, there is currently no standard way to see what happened.
MemoryLens instruments the four fundamental memory operations — write, read, compress, and update — and records them as structured traces. These traces expose the full context of every memory interaction: what was stored, what was retrieved, what similarity scores drove retrieval decisions, what content was lost during summarization, and how much each operation cost in tokens and dollars.
AI agents built with frameworks like LangChain, Mem0, LlamaIndex, Letta, and Zep rely on memory backends that operate as black boxes. Failures — silent drops, threshold misses, lossy compression — are invisible until they degrade user experience. MemoryLens makes the invisible visible.
What Problem It Solves
- Silent memory drops: writes that fail or are silently discarded due to capacity policies leave no trace without explicit instrumentation
- Retrieval threshold mysteries: a memory that scored 0.68 when the threshold is 0.70 is filtered out invisibly — the Retrieval Debugger surfaces exactly this
- Lossy compression: conversation summarization can drop critical facts; the Compression Auditor identifies which sentences were lost and quantifies semantic loss
- Cost blindness: token and dollar costs per memory operation are typically unmeasured; Cost Attribution tracks them at span granularity
- No cross-framework standard: each memory backend has its own logging approach; MemoryLens provides a unified trace schema across all supported frameworks
Design Philosophy
- Zero runtime overhead: the BatchSpanProcessor exports spans asynchronously in a background thread, keeping instrumentation overhead below 2ms p99
- OpenTelemetry-native: spans export via OTLP, making MemoryLens compatible with every OTel-compatible observability backend (Grafana, Honeycomb, Jaeger, Datadog, etc.)
- Offline analysis: compression auditing and cost enrichment are post-hoc commands, never executed in the hot path
- Opt-in extras: framework integrations, the web UI, audit capabilities, and cost tools are all optional extras — the core package remains lean
- PII safety by default: content capture can be disabled globally via
MEMORYLENS_CAPTURE_CONTENT=false
System Architecture
MemoryLens is organized as a single Python package (memorylens) with a layered internal structure. Each layer has a clearly defined responsibility and communicates with adjacent layers through well-typed interfaces.
WAL mode, indexed
compatible backend
portable export
LocalScorer / OpenAIScorer
compression_audits table
PricingModel
update_span_attributes()
LlamaIndex / Letta / Zep
Instrumentor protocol
Trace List, Detail,
Retrieval Debugger,
Compression Audit
audit / cost
Rich tables + JSON output
Data Flow: Instrumentation to Visualization
instrumentor
_MutableSpan
dataclass
background thread
/ JSONL
remote backend
audit, cost
Key Architectural Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Distribution | Core + optional extras | Keeps core lean; integrations, UI, audit, cost are all opt-in |
| Python version | 3.10+ | Modern type syntax, match/case, covers vast majority of agent developers |
| Async strategy | Sync API, async export in background thread | Small API surface, <2ms p99 overhead via BatchSpanProcessor |
| Local storage | SQLite (primary) + JSONL export | Structured queries for CLI + portable file export |
| Instrumentation | Decorators (core) + auto-instrumentation (integrations) | Manual for custom backends; "3 lines of code" for supported frameworks |
| CLI style | Subcommand-based (Typer) | Scriptable, composable, rich table output |
| Web UI tech | FastAPI + htmx + Jinja2 | Python-native, no Node.js/build step required |
| OTel integration | Full OTLP export (gRPC + HTTP) | 1:1 span mapping; works with every OTel backend |
| Analysis timing | Offline (CLI commands) | Zero runtime overhead; audits are re-runnable |
| Build system | uv + hatchling + pyproject.toml | Fast, modern, signals contemporary project |
Core SDK — memorylens._core
The core layer is the heart of MemoryLens. It defines the trace schema, manages the tracer lifecycle, implements context propagation, and runs the span processor pipeline. Everything else in the system builds on top of these primitives.
Trace Schema and Data Model
MemorySpan
MemorySpan is a frozen, slotted dataclass — immutable once finalized. Every memory operation produces exactly one MemorySpan. The schema is the single source of truth for all downstream consumers: exporters, the CLI, the web UI, and the analysis tools.
# src/memorylens/_core/span.py
@dataclass(frozen=True, slots=True)
class MemorySpan:
"""A single traced memory operation."""
# Identity
span_id: str # unique hex UUID
trace_id: str # groups spans in one logical operation
parent_span_id: str | None # for nested operations
# Classification
operation: MemoryOperation # WRITE, READ, COMPRESS, UPDATE
status: SpanStatus # OK, ERROR, DROPPED
# Timing (epoch nanoseconds)
start_time: float
end_time: float
duration_ms: float # computed: (end - start) / 1_000_000
# Context (inherited from MemoryContext)
agent_id: str | None
session_id: str | None
user_id: str | None
# Memory content (redactable via MEMORYLENS_CAPTURE_CONTENT=false)
input_content: str | None
output_content: str | None
# Operation-specific attributes (free-form dict)
attributes: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
"""Serialize to plain dict. Enum values become their string values."""
MemoryOperation Enum
The four fundamental operations of any memory system, modeled as a str enum so values serialize naturally:
| Enum Value | String Value | Meaning |
|---|---|---|
MemoryOperation.WRITE | memory.write | Storing content into memory (includes explicit deletes with status=DROPPED) |
MemoryOperation.READ | memory.read | Retrieving memories, typically by semantic search or key lookup |
MemoryOperation.COMPRESS | memory.compress | Summarizing or condensing memory content via an LLM |
MemoryOperation.UPDATE | memory.update | Modifying an existing memory entry (merge, replace, or append) |
SpanStatus Enum
| Enum Value | String Value | When Used |
|---|---|---|
SpanStatus.OK | ok | Operation completed successfully |
SpanStatus.ERROR | error | Operation raised an exception; error.type and error.message are set in attributes |
SpanStatus.DROPPED | dropped | Content was explicitly discarded; drop_reason attribute explains why |
Operation-Specific Attributes
The attributes dict carries operation-type-specific fields. These map 1:1 to OTLP span attributes with a memorylens. prefix:
| Operation | Key Attributes |
|---|---|
| WRITE | memory_key, backend, drop_reason, drop_policy, embedding_model, vector_dim |
| READ | query, results_count, scores (list of floats), threshold, backend, top_k |
| COMPRESS | pre_content, post_content, compression_ratio, semantic_loss_score, model_used |
| UPDATE | memory_key, previous_version, new_version, update_type (merge/replace/append) |
TracerProvider and Tracer
The TracerProvider is a singleton that holds global configuration: the processor pipeline, the sampler, and the service name. All instrumentation flows through it.
# src/memorylens/_core/tracer.py
class TracerProvider:
"""Singleton that manages tracers, processors, and sampling."""
_instance: TracerProvider | None = None
def __init__(self) -> None:
self.processors: list[SpanProcessor] = []
self.sampler = Sampler(rate=1.0)
self.service_name: str = "memorylens"
def add_processor(self, processor: SpanProcessor) -> None: ...
def get_tracer(self, name: str) -> Tracer: ...
def shutdown(self) -> None: ...
@classmethod
def get(cls) -> TracerProvider: ... # singleton accessor
@classmethod
def reset(cls) -> None: ... # for testing — shuts down and clears instance
The Tracer is created per-module (by name) via provider.get_tracer(name). It provides a context manager for creating spans:
class Tracer:
@contextmanager
def start_span(
self,
operation: MemoryOperation,
parent_span_id: str | None = None,
attributes: dict[str, Any] | None = None,
) -> Generator[_MutableSpan, None, None]:
# 1. Check sampler — if not sampled, yield a no-op span
# 2. Read current MemoryContext for agent/session/user IDs
# 3. Create _MutableSpan with new trace_id and span_id (UUID hex)
# 4. Call on_start() on all processors
# 5. yield span — caller can set attributes, content, status
# 6. On exception: set status=ERROR, capture error.type/error.message
# 7. On exit: finalize() → frozen MemorySpan, call on_end() on all processors
The internal _MutableSpan builder accumulates state during the span lifecycle. It exposes set_attribute(), set_status(), and set_content() methods. finalize() stamps end_time, computes duration_ms, and returns an immutable MemorySpan.
Decorators
The four instrument_* decorators are the primary instrumentation API. They wrap existing functions without requiring any modification to the function body.
# src/memorylens/_core/decorators.py
@instrument_write(backend="mem0", capture_content=True)
def store_memory(user_id: str, content: str) -> bool: ...
@instrument_read(backend="mem0", top_k=5)
def search_memories(query: str) -> list[Memory]: ...
@instrument_compress(model="gpt-4o-mini")
def summarize(memories: list[str]) -> str: ...
@instrument_update(backend="mem0")
def update_memory(memory_id: str, new_content: str) -> bool: ...
All four decorators are implemented via a shared _make_decorator(operation, **kwargs) factory. The wrapper function:
- Gets the
TracerProvidersingleton and acquires a tracer for the wrapped function's module - Opens a span with the correct
MemoryOperationand any static keyword arguments as initial attributes - Conditionally captures
repr(args/kwargs)asinput_content(ifcapture_content=True) - Calls the original function
- Conditionally captures
repr(result)asoutput_content - Returns the result — exceptions propagate naturally after setting
status=ERROR
Content capture resolution order: explicit decorator kwarg → MEMORYLENS_CAPTURE_CONTENT env var → default True.
Context Propagation
The MemoryContext is a frozen dataclass that acts as a Python context manager. It uses ContextVar (from the standard library contextvars module) to propagate agent/session/user metadata through the call stack without modifying function signatures. It is fully async-safe because ContextVar isolates state per asyncio Task.
# src/memorylens/_core/context.py
@dataclass(frozen=True, slots=True)
class MemoryContext:
agent_id: str | None = None
session_id: str | None = None
user_id: str | None = None
def __enter__(self) -> Self:
# sets the ContextVar token
object.__setattr__(self, "_token", _current_context.set(self))
return self
def __exit__(self, *exc) -> None:
# resets to the previous context (safe nesting)
_current_context.reset(self._token)
# Usage:
with memorylens.context(agent_id="support-bot", session_id="sess-123"):
store_memory("user prefers vegetarian meals")
# → span will have agent_id="support-bot", session_id="sess-123"
The public memorylens.context() function wraps MemoryContext constructor. The Tracer.start_span() calls get_current_context() to read the active context at span creation time.
SpanProcessor Pipeline
Span processors sit between the tracer and the exporters. They receive spans via on_start() (before execution) and on_end() (after execution). Multiple processors can be registered; they are called in registration order.
class SpanProcessor(Protocol):
def on_start(self, span: MemorySpan) -> None: ...
def on_end(self, span: MemorySpan) -> None: ...
def shutdown(self) -> None: ...
def force_flush(self, timeout_ms: int = 30000) -> bool: ...
SimpleSpanProcessor
Synchronous: calls exporter.export([span]) immediately on on_end(). Used for debugging, testing, and scenarios where latency is not a concern.
BatchSpanProcessor
The production processor. Appends spans to a bounded deque (max 2048) under a lock. A daemon thread wakes every 5 seconds (or when the queue reaches 512 spans, or on force_flush()) and exports in batches. This keeps the hot path non-blocking.
class BatchSpanProcessor:
def __init__(
self,
exporter: SpanExporter,
max_batch_size: int = 512, # trigger flush when queue hits this
schedule_delay_ms: int = 5000, # background flush interval
max_queue_size: int = 2048, # drops oldest spans if exceeded
) -> None: ...
On shutdown(), the processor signals the background thread, waits up to 10 seconds, flushes any remaining spans, then shuts down the exporter.
Sampler
The Sampler is a simple rate-based sampler. It generates a random float on each call to should_sample() and returns True if it is below the configured rate. Rate is set via memorylens.init(sample_rate=0.1) or MEMORYLENS_SAMPLE_RATE env var. Default is 1.0 (sample everything). When a span is not sampled, the tracer yields a no-op _MutableSpan with empty IDs and skips all processor calls.
Exporters — memorylens._exporters
Exporters are the output stage of the pipeline. They receive batches of MemorySpan objects from the processor and persist them to a backend. Three exporters ship with MemoryLens core.
SpanExporter Protocol
class SpanExporter(Protocol):
def export(self, spans: list[MemorySpan]) -> ExportResult: ...
def shutdown(self) -> None: ...
class ExportResult(Enum):
SUCCESS = "success"
FAILURE = "failure"
SQLite Exporter
The primary local storage backend. Writes spans to a SQLite database at ~/.memorylens/traces.db (configurable). Designed for developer use: zero server setup, persistent across sessions, queryable with standard SQL.
Database Schema
-- spans table (created on first use)
CREATE TABLE IF NOT EXISTS spans (
span_id TEXT PRIMARY KEY,
trace_id TEXT NOT NULL,
parent_span_id TEXT,
operation TEXT NOT NULL, -- "memory.write" etc.
status TEXT NOT NULL, -- "ok" / "error" / "dropped"
start_time REAL NOT NULL, -- epoch nanoseconds
end_time REAL NOT NULL,
duration_ms REAL NOT NULL,
agent_id TEXT,
session_id TEXT,
user_id TEXT,
input_content TEXT,
output_content TEXT,
attributes TEXT NOT NULL DEFAULT '{}' -- JSON blob
)
-- Indexes for common query patterns
CREATE INDEX IF NOT EXISTS idx_spans_trace_id ON spans (trace_id)
CREATE INDEX IF NOT EXISTS idx_spans_session_id ON spans (session_id)
CREATE INDEX IF NOT EXISTS idx_spans_operation ON spans (operation)
CREATE INDEX IF NOT EXISTS idx_spans_start_time ON spans (start_time)
The connection is opened in WAL mode (PRAGMA journal_mode=WAL) for concurrent read safety. check_same_thread=False is set because the BatchSpanProcessor writes from a background thread.
Query Methods
| Method | Parameters | Returns | Used By |
|---|---|---|---|
query() |
trace_id, operation, status, agent_id, session_id, limit |
list[dict] |
CLI commands |
query_extended() |
Same as above + q (full-text), offset |
tuple[list[dict], int] (rows + total count) |
Web UI (pagination + search) |
save_audit() |
audit: CompressionAudit |
None |
Compression Auditor |
get_audit() |
span_id: str |
dict | None |
Compression UI view |
list_audits() |
limit, offset |
tuple[list[dict], int] |
CLI audit list |
update_span_attributes() |
span_id, new_attrs: dict |
None |
Cost enricher (merges cost_usd into attributes) |
Compression Audits Table
Created lazily on the first save_audit() call (not at initialization):
CREATE TABLE IF NOT EXISTS compression_audits (
span_id TEXT PRIMARY KEY,
semantic_loss_score REAL NOT NULL,
compression_ratio REAL NOT NULL,
pre_sentence_count INTEGER NOT NULL,
post_sentence_count INTEGER NOT NULL,
sentences TEXT NOT NULL, -- JSON array of SentenceAnalysis dicts
scorer_backend TEXT NOT NULL, -- "local" or "openai"
created_at REAL NOT NULL -- epoch timestamp
)
OTLP Exporter
Translates MemorySpan objects to OpenTelemetry spans using a _ReadableSpanAdapter that presents the MemoryLens schema as an OTel-compatible interface. This allows using the upstream OTLPSpanExporter from the OpenTelemetry SDK directly.
_ReadableSpanAdapter
The adapter presents MemorySpan fields as OTel span properties:
| OTel Property | Source | Notes |
|---|---|---|
name | span.operation.value | e.g., "memory.write" |
context.trace_id | int(span.trace_id[:32], 16) | 128-bit int from hex string |
context.span_id | int(span.span_id[:16], 16) | 64-bit int from hex string |
start_time | int(span.start_time) | nanoseconds |
end_time | int(span.end_time) | nanoseconds |
attributes | All span fields prefixed with memorylens. | Nested dicts serialized as JSON strings |
status | StatusCode.ERROR if error, else StatusCode.OK | |
kind | SpanKind.INTERNAL | |
resource | Resource.create({"service.name": "memorylens"}) |
The OTLP exporter is configured via standard OTel environment variables: OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, and OTEL_EXPORTER_OTLP_PROTOCOL. The default endpoint is http://localhost:4317.
JSONL Exporter
The simplest exporter — writes one JSON object per line to stdout or a file path. Stateless. Used by memorylens traces export and as a portable format for piping to other tools. Each line is the output of span.to_dict().
Exporter Registry and Factory
# src/memorylens/_exporters/__init__.py
def create_exporter(name: str, **kwargs) -> SpanExporter:
match name:
case "sqlite": return SQLiteExporter(**kwargs)
case "otlp": return OTLPExporter(**kwargs)
case "jsonl": return JSONLExporter(**kwargs)
case _: raise ValueError(f"Unknown exporter: {name}")
Framework Integrations — memorylens.integrations
Auto-instrumentation patches framework memory classes at runtime so that zero code changes are required in user applications. Each integration follows an identical Instrumentor protocol.
Instrumentor Protocol
class Instrumentor(Protocol):
def instrument(self, **kwargs) -> None: ...
def uninstrument(self) -> None: ...
Each instrumentor stores the original method references so uninstrument() can restore the original behavior. The integration registry maps names to instrumentor classes:
# memorylens.init(instrument=["langchain", "mem0"]) resolves via:
def create_instrumentor(name: str) -> Instrumentor:
return _REGISTRY[name]() # raises helpful error if framework not installed
| Method | Operation |
|---|---|
save_context() | WRITE |
load_memory_variables() | READ |
ConversationSummaryMemory summarize | COMPRESS |
| Method | Operation |
|---|---|
add() | WRITE |
search() | READ (captures scores) |
update() | UPDATE |
delete() | WRITE status=DROPPED |
| Method | Operation |
|---|---|
put(message) | WRITE |
put_messages(messages) | WRITE |
get(input) | READ |
get_all() | READ |
reset() | WRITE drop_reason="reset" |
| Method | Operation |
|---|---|
agents.blocks.retrieve() | READ |
agents.blocks.update() | UPDATE |
agents.blocks.delete() | WRITE drop_reason="explicit_delete" |
agents.blocks.list() | READ |
| Method | Operation |
|---|---|
memory.add(session_id, messages) | WRITE |
memory.get(session_id) | READ |
memory.search(session_id, payload) | READ (captures scores) |
memory.delete(session_id) | WRITE drop_reason="session_delete" |
Auto-Instrumentation Shorthand
# Initialize all integrations in one call:
memorylens.init(instrument=["langchain", "mem0", "llamaindex"])
# Or initialize individual instrumentors explicitly:
from memorylens.integrations.langchain import LangChainInstrumentor
LangChainInstrumentor().instrument()
# To stop instrumentation:
from memorylens.integrations.mem0 import Mem0Instrumentor
inst = Mem0Instrumentor()
inst.instrument()
# ... later ...
inst.uninstrument()
If a framework is not installed, a clear error is raised: "LangChain not found. Install with: pip install memorylens[langchain]".
Web Dashboard — memorylens._ui
The MemoryLens web dashboard is a local developer tool that provides a browser-based interface for trace inspection. It launches with a single CLI command and reads from the same SQLite database that the SDK writes to. It is available as an optional extra: pip install memorylens[ui].
Technology Stack
- FastAPI — async Python web framework; app factory pattern for testability
- Jinja2 — server-side HTML templating; templates live in
_ui/templates/ - htmx — loaded via CDN; enables AJAX interactions without JavaScript code
- TailwindCSS — loaded via CDN; utility-first styling, dark theme support
- uvicorn — ASGI server; binds to
127.0.0.1only (local-only access)
App Factory
# src/memorylens/_ui/server.py
def create_app(db_path: str = _DEFAULT_DB, ingest: bool = False) -> FastAPI:
app = FastAPI(title="MemoryLens", docs_url=None, redoc_url=None)
# Mount static files at /static
# Configure Jinja2 templates
# Create SQLiteExporter(db_path) — shared via app.state
# Register: create_trace_routes(app)
# Register: create_compression_routes(app)
if ingest:
# Register: create_ingest_routes(app) — OTLP HTTP receiver
return app
def run(db_path, port=8000, ingest=False) -> None:
app = create_app(db_path, ingest)
uvicorn.run(app, host="127.0.0.1", port=port, log_level="warning")
Content Negotiation
All filterable endpoints serve both full HTML pages and htmx partials from the same URL. htmx automatically sends an HX-Request: true header on all requests it triggers. The routes check for this header (or use separate /api/ paths for partials) and return the appropriate template.
API Endpoints
| Method | Path | Template / Returns | Purpose |
|---|---|---|---|
| GET | / | Redirect to /traces | Root redirect |
| GET | /traces | traces_list.html | Full trace list page |
| GET | /api/traces | partials/trace_table.html | htmx-filtered table rows |
| GET | /traces/{trace_id} | traces_detail.html | Trace detail with timeline |
| GET | /traces/{trace_id}/retrieval | retrieval_debug.html | Retrieval debugger (READ spans only) |
| GET | /traces/{trace_id}/compression | compression_audit.html | Compression audit (COMPRESS spans only) |
| POST | /api/traces/{trace_id}/audit | Redirect | Run compression audit, redirect back |
| POST | /v1/traces | JSON {} | OTLP HTTP/JSON ingest (optional) |
Trace List View (/traces)
The main landing page. A filterable, paginated table of all memory operation traces.
- Filter bar: search input (debounced 300ms), operation dropdown, status dropdown, agent ID input
- Table columns: Trace ID, Operation (color-coded badge), Status (dot indicator), Duration, Agent, Session, Content Preview, Time (relative)
- Error rows receive a subtle red background tint
- Pagination: "Showing 1–50 of 247" with Prev/Next using htmx offset parameter
- htmx live tail: when
--ingestis active,hx-trigger="every 2s"polls for new traces
Trace Detail View (/traces/{trace_id})
Deep-dive into a single span. Two-column layout:
- Left column: span timeline bar (colored by operation type), input/output content blocks (monospace), error block (red-tinted, only for error/dropped spans)
- Right column: attributes panel split into standard fields and custom attributes; action buttons including "Export JSON" and context-specific debug links ("Debug Retrieval" for READ spans, "Debug Compression" for COMPRESS spans)
Retrieval Debugger (/traces/{trace_id}/retrieval)
The flagship debugging view for READ operations. Visualizes exactly why a retrieval returned or missed specific memories:
- Query section: search query text + parameters panel (backend, top_k, threshold, results_count)
- Score visualization: horizontal bars for every candidate memory scaled by similarity score (0.0–1.0). Green bars with "RETURNED" badge above threshold; red bars with "FILTERED" badge below. A dashed yellow vertical threshold line runs through all bars
- Near-miss callout: amber box that appears when candidates scored within 0.10 of the threshold, with suggested actions
- Data source: all data comes from existing READ span attributes (
scores,threshold,top_k,results_count) — no new SDK changes required
OTLP Ingest Endpoint (POST /v1/traces)
When started with --ingest, the dashboard doubles as a lightweight OTLP HTTP/JSON collector. This allows agent applications to point their OTEL_EXPORTER_OTLP_ENDPOINT directly at the dashboard:
# Agent configuration for live ingest:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:8000
export OTEL_EXPORTER_OTLP_PROTOCOL=http/json
The ingest handler parses ExportTraceServiceRequest JSON, iterates resourceSpans[].scopeSpans[].spans[], skips spans without a memorylens.operation attribute, maps OTel attributes back to MemorySpan fields, and writes to SQLite. The UI then reflects new traces within the live poll interval.
HTTP/JSON only (no gRPC) — gRPC would require protobuf compilation on the server side, adding unnecessary complexity for a local dev tool.
File Structure
├── __init__.py
├── server.py # FastAPI app factory + uvicorn launcher
├── api/
│ ├── traces.py # page routes + filtered API endpoints
│ ├── compression.py # compression audit routes
│ └── ingest.py # OTLP HTTP/JSON receiver
├── templates/
│ ├── base.html # layout: nav, Tailwind CDN, htmx script
│ ├── traces_list.html # trace list page
│ ├── traces_detail.html # single trace timeline view
│ ├── retrieval_debug.html
│ ├── compression_audit.html
│ └── partials/
│ ├── trace_table.html # htmx partial: filterable rows
│ ├── span_timeline.html
│ ├── score_chart.html
│ └── sentence_diff.html # compression audit partial
└── static/
└── app.css # custom styles (on top of Tailwind)
Compression Auditor — memorylens._audit
The Compression Auditor performs offline semantic analysis of COMPRESS spans, determining what information was preserved and what was lost during memory summarization. The SDK already captures input_content (pre-compression) and output_content (post-compression) on COMPRESS spans — this feature uses those fields to compute sentence-level semantic similarity.
Analysis runs as a post-hoc CLI command or UI action, never in the instrumentation hot path. This maintains the <2ms p99 overhead guarantee.
Data Model
@dataclass(frozen=True)
class SentenceAnalysis:
text: str # the original pre-compression sentence
best_match_score: float # max cosine similarity to any post-content sentence
status: str # "preserved" (>= 0.7) or "lost" (< 0.7)
@dataclass(frozen=True)
class CompressionAudit:
span_id: str
semantic_loss_score: float # 1.0 - mean(best_match_scores). 0=no loss, 1=total loss
compression_ratio: float # len(post_content) / len(pre_content)
pre_sentence_count: int
post_sentence_count: int
sentences: list[SentenceAnalysis]
scorer_backend: str # "local" or "openai"
Scorer Backends
The ScorerBackend protocol requires a single method: embed(texts: list[str]) -> list[list[float]]. Three implementations are provided:
| Scorer | Dependency | Model | API Key | Use Case |
|---|---|---|---|---|
| LocalScorer | sentence-transformers |
all-MiniLM-L6-v2 (~80MB) |
None | Default; works fully offline |
| OpenAIScorer | openai (user-installed) |
text-embedding-3-small |
OPENAI_API_KEY |
Higher quality embeddings, cloud-based |
| MockScorer | None (stdlib only) | MD5 hash-based deterministic embeddings | None | Fast, deterministic testing |
# Factory function for creating scorers:
def create_scorer(name: str) -> ScorerBackend:
match name:
case "local": return LocalScorer()
case "openai": return OpenAIScorer()
case "mock": return MockScorer()
Compression Analysis Algorithm
The CompressionAnalyzer.analyze(span_id, pre_content, post_content) method executes the following steps:
- Sentence splitting: split both
pre_contentandpost_contentinto sentences using a regex-based splitter (splits on.!?followed by whitespace or end-of-string, with abbreviation handling) - Batch embedding: concatenate all pre and post sentences into a single list and embed in one batch call — minimizing API calls for the OpenAI backend
- Similarity matrix: for each pre-sentence, compute cosine similarity against every post-sentence using the formula:
dot(a, b) / (||a|| × ||b||) - Best match scoring: take the maximum similarity score across all post-sentences as the
best_match_scorefor each pre-sentence - Classification: score ≥ 0.7 → "preserved"; score < 0.7 → "lost"
- Loss score:
semantic_loss_score = 1.0 − mean(best_match_scores)— clamped to [0, 1] - Compression ratio:
len(post_content) / len(pre_content)(character-level)
Loss Score Classification
| Score Range | Classification | Visual |
|---|---|---|
| 0.0 – 0.29 | Low loss | Green, checkmark ✓ |
| 0.30 – 0.60 | Moderate loss | Amber, warning ⚠ |
| 0.61 – 1.0 | High loss | Red, X ✗ |
Compression Audit UI View (/traces/{trace_id}/compression)
- Header: breadcrumb navigation, span metadata
- Summary card: loss score with color coding, compression ratio, preserved/lost sentence counts
- Sentence diff: list of all pre-compression sentences. Preserved sentences show a green checkmark and a score bar; lost sentences show a red X. Score bars use the same visual style as the Retrieval Debugger. Preserved sentences additionally show the best-matching post-content sentence in subtle text below
- Post-compression content: full compressed text in a code box
- "Run Audit" state: if no audit exists, a button fires
POST /api/traces/{trace_id}/audit?scorer=localwhich runs the analysis and redirects back. Thescorerparameter selects the backend (default:mockin the web UI;localoropenaivia CLI)
Cost Attribution — memorylens._cost
Cost attribution tracks token counts and dollar costs per memory operation. Like the Compression Auditor, it is implemented as offline enrichment — token data is captured as span attributes during instrumentation, and a CLI command computes costs using a configurable pricing model and writes cost_usd back into the span's attributes via update_span_attributes().
Cost attribution uses only Python stdlib (json, pathlib). There are no new optional extras for this feature.
Capturing Token Data
Token data is added as span attributes during instrumentation, either via decorator kwargs or manually:
# Via decorator:
@instrument_write(backend="mem0", tokens_in=150, tokens_out=0, model="gpt-4o-mini")
def store(...): ...
# Via manual span creation:
with tracer.start_span(MemoryOperation.COMPRESS) as span:
span.set_attribute("tokens_in", 1200)
span.set_attribute("tokens_out", 300)
span.set_attribute("model", "gpt-4o-mini")
Default Pricing Table
| Model | Input ($/token) | Output ($/token) |
|---|---|---|
gpt-4o | $0.0000025 | $0.00001 |
gpt-4o-mini | $0.00000015 | $0.0000006 |
gpt-4-turbo | $0.00001 | $0.00003 |
claude-3-opus | $0.000015 | $0.000075 |
claude-3-sonnet | $0.000003 | $0.000015 |
claude-3-haiku | $0.00000025 | $0.00000125 |
text-embedding-3-small | $0.00000002 | $0.0 |
text-embedding-3-large | $0.00000013 | $0.0 |
User Pricing Override
Users override or extend pricing via ~/.memorylens/pricing.json:
{
"my-fine-tuned-model": {"input": 0.00001, "output": 0.00003}
}
The load_pricing() function merges user entries over defaults. User entries take precedence.
Cost Enricher
class CostEnricher:
def enrich_span(self, attrs: dict[str, Any]) -> dict | None:
# Returns None if no token data present
# Returns {"cost_usd": 0.0, "_cost_warning": "..."} for unknown models
# Returns {"cost_usd": round(cost, 10)} on success
cost = tokens_in * pricing[model]["input"] + tokens_out * pricing[model]["output"]
update_span_attributes Mechanism
Cost enrichment writes cost_usd directly into the span's attributes JSON field in SQLite, merging without overwriting other attributes:
# SQLiteExporter method:
def update_span_attributes(self, span_id: str, new_attrs: dict) -> None:
current = json.loads(row["attributes"])
current.update(new_attrs) # merge
# UPDATE spans SET attributes = ? WHERE span_id = ?
UI Integration
No new pages are added. Cost data surfaces in existing views:
- Trace list: a "Cost" column showing
$0.0012or-if no cost data, read fromspan.attributes.cost_usd - Trace detail header: cost shown after duration —
12ms · $0.0012— only whencost_usdis present
CLI Reference — memorylens
The CLI is built with Typer and Rich. All commands read from the local SQLite database by default. Most commands support --json for machine-readable output.
memorylens init
# Creates ~/.memorylens/ directory
# Output: "Initialized MemoryLens at /Users/you/.memorylens"
memorylens ui
$ memorylens ui --port 8080
$ memorylens ui --db-path ./my-traces.db
$ memorylens ui --ingest # also accept OTLP HTTP at /v1/traces
Requires memorylens[ui]. Launches web dashboard at http://127.0.0.1:8000.
memorylens traces
$ memorylens traces list
$ memorylens traces list --operation memory.write
$ memorylens traces list --status error
$ memorylens traces list --last 1h
$ memorylens traces list --agent-id support-bot
$ memorylens traces list --session-id sess-123
# Inspect a single trace
$ memorylens traces show <trace-id>
# Live tail (streaming)
$ memorylens traces tail
$ memorylens traces tail --operation memory.read
$ memorylens traces tail --min-duration 100ms
# Export as JSONL
$ memorylens traces export
$ memorylens traces export --last 24h -o traces.jsonl
memorylens stats
$ memorylens stats --last 7d
$ memorylens stats --group-by operation
Shows aggregate statistics: total spans, error rate, p50/p95/p99 duration, breakdown by operation and status.
memorylens config
$ memorylens config set <key> <value>
memorylens audit
$ memorylens audit compress
$ memorylens audit compress --trace-id abc123
$ memorylens audit compress --scorer openai
$ memorylens audit compress --force # re-audit already-audited spans
# Show detailed audit for a specific span
$ memorylens audit show <span_id>
# List all audit results
$ memorylens audit list
$ memorylens audit list --min-loss 0.3
memorylens cost
$ memorylens cost enrich
$ memorylens cost enrich --trace-id abc123
$ memorylens cost enrich --force # recalculate all
# Cost report
$ memorylens cost report
$ memorylens cost report --group-by agent_id
$ memorylens cost report --group-by session_id
$ memorylens cost report --group-by operation
# Manage pricing
$ memorylens cost pricing
$ memorylens cost pricing --set gpt-4o-mini.input=0.0000002
Cost Report Output Format
Cost Report (grouped by operation)
OPERATION SPANS TOKENS IN TOKENS OUT TOTAL COST
memory.write 45 12,300 0 $0.0018
memory.read 38 8,400 2,100 $0.0031
memory.compress 12 6,800 1,200 $0.0015
Total: 95 spans, 27,500 tokens in, 3,300 tokens out, $0.0064
Audit Output Format
Analyzing 12 COMPRESS spans...
[████████████████████████████████] 12/12
Results:
SPAN ID LOSS SCORE RATIO PRESERVED LOST STATUS
a1b2c3d4 0.12 0.35 8/9 1/9 ✓ low loss
e5f6g7h8 0.45 0.28 4/7 3/7 ⚠ moderate loss
i9j0k1l2 0.78 0.15 2/8 6/8 ✗ high loss
Summary: 12 spans audited. 2 with moderate loss, 1 with high loss.
Public API Reference
MemoryLens exposes exactly 8 public symbols from the top-level memorylens package. Everything else is considered internal (prefixed with _) and may change without notice.
from memorylens import (
init, shutdown,
instrument_write, instrument_read, instrument_compress, instrument_update,
context, get_tracer,
)
Initialize MemoryLens. Call once at application startup. Configures the TracerProvider, builds exporters, and optionally triggers auto-instrumentation.
With no arguments, defaults to SQLite storage at ~/.memorylens/traces.db. Environment variables override all kwargs.
Flush all pending spans in the processor queue and shut down all registered processors and exporters. Call at application exit to ensure no spans are lost.
Decorator that traces memory write operations. Creates a MemoryOperation.WRITE span around the decorated function.
@instrument_write(backend="mem0", capture_content=True)
def store_user_preference(user_id: str, content: str) -> bool: ...
Decorator that traces memory read/retrieval operations. Creates a MemoryOperation.READ span.
@instrument_read(backend="mem0", top_k=5, threshold=0.7)
def search_memories(query: str) -> list[Memory]: ...
Decorator that traces memory compression/summarization operations. Creates a MemoryOperation.COMPRESS span. The input_content and output_content of COMPRESS spans are used by the Compression Auditor.
@instrument_compress(model="gpt-4o-mini")
def summarize_conversation(messages: list[str]) -> str: ...
Decorator that traces memory update operations (merge, replace, or append to an existing memory entry). Creates a MemoryOperation.UPDATE span.
Creates a MemoryContext context manager that attaches agent/session/user metadata to all spans created within the with block. Uses ContextVar — safe to nest and async-safe.
with memorylens.context(agent_id="bot", session_id="sess-1", user_id="u-42"):
store_memory(content)
results = search_memories(query)
# All spans above have agent_id, session_id, user_id set
Escape hatch for manual span creation when the decorators are not sufficient. Returns a Tracer instance from the global TracerProvider.
tracer = memorylens.get_tracer("my.module")
with tracer.start_span(MemoryOperation.WRITE) as span:
span.set_attribute("memory_key", "user_prefs")
span.set_content(input_content=content)
result = do_write(content)
span.set_content(output_content=str(result))
Environment Variable Overrides
| Variable | Effect | Example |
|---|---|---|
MEMORYLENS_EXPORTER |
Override exporter selection; takes precedence over init() kwargs |
sqlite, otlp, jsonl |
MEMORYLENS_CAPTURE_CONTENT |
PII safety toggle — disable content capture in production | false |
MEMORYLENS_SAMPLE_RATE |
Sampling rate; reduces storage and overhead | 0.1 (10%) |
OTEL_EXPORTER_OTLP_ENDPOINT |
OTLP collector URL | http://localhost:4317 |
OTEL_EXPORTER_OTLP_HEADERS |
Auth headers for OTLP collector | x-api-key=abc123 |
OTEL_EXPORTER_OTLP_PROTOCOL |
OTLP protocol selection | http/json |
OTEL_SERVICE_NAME |
Service name in OTLP traces | my-agent |
Configuration
pyproject.toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "memorylens"
version = "0.1.0"
description = "Observability and debugging for AI agent memory systems"
requires-python = ">=3.10"
license = "Apache-2.0"
# Core dependencies (always installed)
dependencies = [
"opentelemetry-api>=1.20",
"opentelemetry-sdk>=1.20",
"opentelemetry-exporter-otlp-proto-grpc>=1.20",
"opentelemetry-exporter-otlp-proto-http>=1.20",
"typer>=0.9",
"rich>=13.0",
]
# Optional extras
[project.optional-dependencies]
langchain = ["langchain-core>=0.1"]
mem0 = ["mem0ai>=0.1"]
llamaindex = ["llama-index-core>=0.10"]
letta = ["letta-client>=0.1"]
zep = ["zep-python>=2.0"]
ui = ["fastapi>=0.110", "uvicorn[standard]>=0.29", "jinja2>=3.1"]
audit = ["sentence-transformers>=2.0", "numpy>=1.24"]
dev = ["pytest>=8.0", "pytest-asyncio>=0.23", "ruff>=0.4",
"mypy>=1.10", "httpx>=0.27", "numpy>=1.24"]
[project.scripts]
memorylens = "memorylens.cli.main:app"
# Tooling configuration
[tool.hatch.build.targets.wheel]
packages = ["src/memorylens"]
[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["src"]
[tool.ruff]
target-version = "py310"
line-length = 100
[tool.mypy]
python_version = "3.10"
strict = true
Optional Extras Summary
| Extra | Install Command | Enables | Key Dependencies |
|---|---|---|---|
[langchain] | pip install memorylens[langchain] | LangChain auto-instrumentation | langchain-core ≥ 0.1 |
[mem0] | pip install memorylens[mem0] | Mem0 auto-instrumentation | mem0ai ≥ 0.1 |
[llamaindex] | pip install memorylens[llamaindex] | LlamaIndex auto-instrumentation | llama-index-core ≥ 0.10 |
[letta] | pip install memorylens[letta] | Letta auto-instrumentation | letta-client ≥ 0.1 |
[zep] | pip install memorylens[zep] | Zep auto-instrumentation | zep-python ≥ 2.0 |
[ui] | pip install memorylens[ui] | Web dashboard (memorylens ui) | fastapi, uvicorn, jinja2 |
[audit] | pip install memorylens[audit] | Compression Auditor (LocalScorer) | sentence-transformers, numpy |
Multiple extras can be combined: pip install memorylens[langchain,mem0,ui,audit]
Local Configuration Files
MemoryLens stores local configuration in ~/.memorylens/:
| File | Purpose |
|---|---|
~/.memorylens/traces.db | SQLite trace database (default location) |
~/.memorylens/pricing.json | User-defined pricing overrides for cost attribution |
Testing
MemoryLens has a comprehensive test suite with 174 tests across all layers of the system. Tests are organized to mirror the package structure.
Test Structure
├── conftest.py # shared fixtures (in-memory SQLite, mock exporters)
├── test_core/
│ ├── test_tracer.py # TracerProvider, Tracer, _MutableSpan, sampling
│ ├── test_decorators.py # all 4 instrument_* decorators, capture_content
│ ├── test_span.py # MemorySpan, to_dict(), enum serialization
│ ├── test_context.py # MemoryContext, nesting, async isolation
│ └── test_processor.py # BatchSpanProcessor, SimpleSpanProcessor, flush
├── test_exporters/
│ ├── test_otlp.py # ReadableSpanAdapter, OTLPExporter
│ ├── test_sqlite.py # CRUD, query, query_extended, audit methods
│ └── test_jsonl.py # JSONL output format
├── test_integrations/
│ ├── test_langchain.py # LangChain instrumentor with mock BaseMemory
│ ├── test_mem0.py # Mem0 instrumentor with mock Memory class
│ ├── test_llamaindex.py # LlamaIndex with mock ChatMemoryBuffer
│ ├── test_letta.py # Letta with mock client blocks
│ └── test_zep.py # Zep with mock memory client
├── test_ui/
│ ├── test_api_traces.py # trace list/detail/retrieval endpoints
│ ├── test_api_ingest.py # OTLP ingest endpoint
│ ├── test_api_compression.py # compression audit endpoints
│ ├── test_templates.py # Jinja2 template rendering
│ └── test_query_extended.py # SQLiteExporter extension
├── test_audit/
│ ├── test_splitter.py # sentence splitting edge cases
│ ├── test_analyzer.py # CompressionAnalyzer with MockScorer
│ ├── test_scorer.py # scorer backends (MockScorer fast, LocalScorer slow)
│ └── test_storage.py # save/get/list audits
├── test_cost/
│ ├── test_pricing.py # pricing load, merge, save
│ ├── test_enricher.py # cost computation, skip logic
│ └── test_storage.py # update_span_attributes
└── test_cli/
├── test_commands.py # traces, stats, config commands
├── test_audit_commands.py
└── test_cost_commands.py
Testing Patterns
Framework Integration Mocks
Integration tests use lightweight fake framework classes rather than installing the full framework. The instrumentor's _get_*_class() function is monkeypatched to return the mock class:
# Example: testing LangChain integration without installing langchain
class FakeBaseMemory:
def save_context(self, inputs, outputs): ...
def load_memory_variables(self, inputs): return {}
def test_langchain_write_span(monkeypatch):
monkeypatch.setattr("memorylens.integrations.langchain.instrumentor._get_base_memory",
lambda: FakeBaseMemory)
inst = LangChainInstrumentor()
inst.instrument()
# verify spans emitted with correct operation type
In-Memory SQLite
Database tests use SQLite in-memory connections (db_path=":memory:") for isolation and speed. Each test gets a fresh database via pytest fixtures.
Slow Test Marker
Integration tests that load the real LocalScorer (requires downloading the ~80MB all-MiniLM-L6-v2 model) are marked with @pytest.mark.slow and excluded from the default test run:
# Run fast tests only (default CI):
uv run pytest tests/ -v
# Run slow tests only (model integration):
uv run pytest tests/ -v -m slow
MockScorer for Audit Tests
The MockScorer uses MD5 hashing to produce deterministic, fast embeddings. Texts that share more words get more similar vectors, making it useful for verifying the overall analysis pipeline without requiring a real embedding model.
UI Tests with httpx
FastAPI endpoint tests use httpx.AsyncClient with an in-process test client. htmx partial responses are tested by including the HX-Request: true header. No browser automation is used — the server-driven htmx approach means API response testing fully covers all interactive behaviors.
Running Tests
# Install dev dependencies
uv sync --extra dev
# Run all fast tests
uv run pytest tests/ -v
# Run with coverage
uv run pytest tests/ --cov=memorylens --cov-report=term-missing
# Lint
uv run ruff check src/ tests/
# Type check
uv run mypy src/memorylens/
Technology Stack
match/case, dataclasses with slots, and contextvarsopentelemetry-api, opentelemetry-sdk, OTLP exporters for gRPC and HTTPsqlite3. WAL mode, 4 indexes, lazy migrationall-MiniLM-L6-v2) for the Compression Auditor. Optional [audit] extrawarn_return_any, warn_unused_configssrc/ layouttext-embedding-3-small. User-installedProject Statistics
Source File Breakdown
| Module | Description | Key Files |
|---|---|---|
memorylens | Public API package | __init__.py |
memorylens._core | Tracer, span, decorators, context, processors, sampler | 7 files |
memorylens._exporters | SQLite, OTLP, JSONL, base protocol, registry | 5 files |
memorylens.integrations | 5 framework instrumentors + registry | 11 files |
memorylens._ui | FastAPI server, 3 route modules, templates, static | 7 Python files + templates |
memorylens._audit | Compression analyzer, scorer backends, splitter | 4 files |
memorylens._cost | Pricing model, cost enricher | 3 files |
memorylens.cli | CLI entry point + 5 command modules + formatters | 8 files |
Phase Delivery Timeline
| Phase | Date | Scope | Status |
|---|---|---|---|
| Phase 1 | 2026-04-07 | Core SDK: decorators, context, processors, SQLite/OTLP/JSONL exporters, LangChain + Mem0 integrations, CLI | Complete |
| Phase 2a | 2026-04-07 | Web Dashboard: FastAPI server, Trace List, Trace Detail, Retrieval Debugger, OTLP ingest endpoint | Complete |
| Phase 2b | 2026-04-08 | Compression Auditor: CompressionAnalyzer, LocalScorer/OpenAIScorer/MockScorer, CLI audit commands, UI view | Complete |
| Phase 2c | 2026-04-08 | Cost Attribution: pricing model, CostEnricher, update_span_attributes, CLI cost commands, UI integration | Complete |
| Phase 2d | 2026-04-08 | Additional Integrations: LlamaIndex, Letta, and Zep auto-instrumentation | Complete |
Complete Project File Structure
├── pyproject.toml # build, deps, extras, tools
├── uv.lock
├── LICENSE # Apache 2.0
├── README.md
├── src/
│ └── memorylens/
│ ├── __init__.py # 8 public symbols, init(), context()
│ ├── _core/
│ │ ├── tracer.py # TracerProvider, Tracer, _MutableSpan
│ │ ├── span.py # MemorySpan frozen dataclass
│ │ ├── decorators.py # instrument_write/read/compress/update
│ │ ├── schema.py # MemoryOperation, SpanStatus enums
│ │ ├── context.py # MemoryContext, ContextVar
│ │ ├── processor.py # SpanProcessor, SimpleSpanProcessor, BatchSpanProcessor
│ │ └── sampler.py # Sampler (rate-based)
│ ├── _exporters/
│ │ ├── base.py # SpanExporter protocol, ExportResult
│ │ ├── sqlite.py # SQLiteExporter with all query methods
│ │ ├── otlp.py # OTLPExporter, _ReadableSpanAdapter
│ │ ├── jsonl.py # JSONLExporter
│ │ └── __init__.py # create_exporter() factory
│ ├── integrations/
│ │ ├── __init__.py # Instrumentor protocol, registry, create_instrumentor()
│ │ ├── langchain/ # BaseMemory patcher
│ │ ├── mem0/ # Memory class patcher
│ │ ├── llamaindex/ # ChatMemoryBuffer patcher
│ │ ├── letta/ # Letta client blocks patcher
│ │ └── zep/ # Zep memory client patcher
│ ├── _audit/
│ │ ├── analyzer.py # CompressionAnalyzer, CompressionAudit, SentenceAnalysis
│ │ ├── scorer.py # ScorerBackend, MockScorer, LocalScorer, OpenAIScorer
│ │ └── splitter.py # split_sentences()
│ ├── _cost/
│ │ ├── pricing.py # DEFAULT_PRICING, load_pricing(), save_user_pricing()
│ │ └── enricher.py # CostEnricher
│ ├── _ui/ # FastAPI server, routes, templates
│ └── cli/
│ ├── main.py # Typer app, init, ui commands
│ ├── formatters.py # rich table + JSON output helpers
│ └── commands/
│ ├── traces.py # list, show, tail, export
│ ├── stats.py # summary statistics
│ ├── config.py # config show/set
│ ├── audit.py # audit compress/show/list
│ └── cost.py # cost enrich/report/pricing
└── tests/ # 174 tests, 39 test files