Technical Documentation

MemoryLens

Observability and debugging for AI agent memory systems — complete technical reference covering SDK, exporters, integrations, web dashboard, compression auditor, and cost attribution.

v0.1.0 Alpha
Python 3.10+
Apache 2.0
174 Tests
Phases 1 & 2 (2a–2d)
April 2026
Table of Contents
01 Executive Summary
02 Architecture Overview
03 Core SDK
04 Exporters
05 Framework Integrations
06 Web Dashboard
07 Compression Auditor
08 Cost Attribution
09 CLI Reference
10 Public API Reference
11 Configuration
12 Testing
13 Technology Stack
14 Project Statistics

What is MemoryLens?

MemoryLens is an open-source observability and debugging tool designed specifically for AI agent memory systems. It addresses a critical gap in modern AI application development: the memory pipeline is opaque. When an agent "forgets" something a user said, retrieves the wrong memories, or silently drops a fact during compression, there is currently no standard way to see what happened.

MemoryLens instruments the four fundamental memory operations — write, read, compress, and update — and records them as structured traces. These traces expose the full context of every memory interaction: what was stored, what was retrieved, what similarity scores drove retrieval decisions, what content was lost during summarization, and how much each operation cost in tokens and dollars.

Core Problem Statement

AI agents built with frameworks like LangChain, Mem0, LlamaIndex, Letta, and Zep rely on memory backends that operate as black boxes. Failures — silent drops, threshold misses, lossy compression — are invisible until they degrade user experience. MemoryLens makes the invisible visible.

What Problem It Solves

Design Philosophy

System Architecture

MemoryLens is organized as a single Python package (memorylens) with a layered internal structure. Each layer has a clearly defined responsibility and communicates with adjacent layers through well-typed interfaces.

MemoryLens — Layered Architecture
Public API
memorylens.init() memorylens.context() @instrument_write @instrument_read @instrument_compress @instrument_update memorylens.get_tracer() memorylens.shutdown()
Core
TracerProvider (singleton) Tracer + _MutableSpan MemorySpan (frozen dataclass) BatchSpanProcessor SimpleSpanProcessor MemoryContext (ContextVar) Sampler
Exporters
SQLiteExporter
~/.memorylens/traces.db
WAL mode, indexed
OTLPExporter
gRPC to any OTel
compatible backend
JSONLExporter
stdout or file
portable export
Analysis
Compression Auditor
CompressionAnalyzer
LocalScorer / OpenAIScorer
compression_audits table
Cost Enricher
CostEnricher
PricingModel
update_span_attributes()
Integrations
LangChain / Mem0
LlamaIndex / Letta / Zep
Instrumentor protocol
Interface
Web Dashboard
FastAPI + Jinja2 + htmx
Trace List, Detail,
Retrieval Debugger,
Compression Audit
CLI (Typer)
traces / stats / config
audit / cost
Rich tables + JSON output

Data Flow: Instrumentation to Visualization

📝
User Code
decorator or
instrumentor
Tracer
creates
_MutableSpan
💾
MemorySpan
frozen
dataclass
SpanProcessor
batch queue
background thread
📥
Exporter
SQLite / OTLP
/ JSONL
📋
Storage
traces.db or
remote backend
📊
CLI / UI
inspect, debug
audit, cost

Key Architectural Decisions

DecisionChoiceRationale
DistributionCore + optional extrasKeeps core lean; integrations, UI, audit, cost are all opt-in
Python version3.10+Modern type syntax, match/case, covers vast majority of agent developers
Async strategySync API, async export in background threadSmall API surface, <2ms p99 overhead via BatchSpanProcessor
Local storageSQLite (primary) + JSONL exportStructured queries for CLI + portable file export
InstrumentationDecorators (core) + auto-instrumentation (integrations)Manual for custom backends; "3 lines of code" for supported frameworks
CLI styleSubcommand-based (Typer)Scriptable, composable, rich table output
Web UI techFastAPI + htmx + Jinja2Python-native, no Node.js/build step required
OTel integrationFull OTLP export (gRPC + HTTP)1:1 span mapping; works with every OTel backend
Analysis timingOffline (CLI commands)Zero runtime overhead; audits are re-runnable
Build systemuv + hatchling + pyproject.tomlFast, modern, signals contemporary project

Core SDK — memorylens._core

The core layer is the heart of MemoryLens. It defines the trace schema, manages the tracer lifecycle, implements context propagation, and runs the span processor pipeline. Everything else in the system builds on top of these primitives.

Trace Schema and Data Model

MemorySpan

MemorySpan is a frozen, slotted dataclass — immutable once finalized. Every memory operation produces exactly one MemorySpan. The schema is the single source of truth for all downstream consumers: exporters, the CLI, the web UI, and the analysis tools.

# src/memorylens/_core/span.py
@dataclass(frozen=True, slots=True)
class MemorySpan:
    """A single traced memory operation."""

    # Identity
    span_id: str                # unique hex UUID
    trace_id: str               # groups spans in one logical operation
    parent_span_id: str | None  # for nested operations

    # Classification
    operation: MemoryOperation  # WRITE, READ, COMPRESS, UPDATE
    status: SpanStatus          # OK, ERROR, DROPPED

    # Timing (epoch nanoseconds)
    start_time: float
    end_time: float
    duration_ms: float          # computed: (end - start) / 1_000_000

    # Context (inherited from MemoryContext)
    agent_id: str | None
    session_id: str | None
    user_id: str | None

    # Memory content (redactable via MEMORYLENS_CAPTURE_CONTENT=false)
    input_content: str | None
    output_content: str | None

    # Operation-specific attributes (free-form dict)
    attributes: dict[str, Any] = field(default_factory=dict)

    def to_dict(self) -> dict[str, Any]:
        """Serialize to plain dict. Enum values become their string values."""

MemoryOperation Enum

The four fundamental operations of any memory system, modeled as a str enum so values serialize naturally:

Enum ValueString ValueMeaning
MemoryOperation.WRITEmemory.writeStoring content into memory (includes explicit deletes with status=DROPPED)
MemoryOperation.READmemory.readRetrieving memories, typically by semantic search or key lookup
MemoryOperation.COMPRESSmemory.compressSummarizing or condensing memory content via an LLM
MemoryOperation.UPDATEmemory.updateModifying an existing memory entry (merge, replace, or append)

SpanStatus Enum

Enum ValueString ValueWhen Used
SpanStatus.OKokOperation completed successfully
SpanStatus.ERRORerrorOperation raised an exception; error.type and error.message are set in attributes
SpanStatus.DROPPEDdroppedContent was explicitly discarded; drop_reason attribute explains why

Operation-Specific Attributes

The attributes dict carries operation-type-specific fields. These map 1:1 to OTLP span attributes with a memorylens. prefix:

OperationKey Attributes
WRITE memory_key, backend, drop_reason, drop_policy, embedding_model, vector_dim
READ query, results_count, scores (list of floats), threshold, backend, top_k
COMPRESS pre_content, post_content, compression_ratio, semantic_loss_score, model_used
UPDATE memory_key, previous_version, new_version, update_type (merge/replace/append)

TracerProvider and Tracer

The TracerProvider is a singleton that holds global configuration: the processor pipeline, the sampler, and the service name. All instrumentation flows through it.

# src/memorylens/_core/tracer.py
class TracerProvider:
    """Singleton that manages tracers, processors, and sampling."""
    _instance: TracerProvider | None = None

    def __init__(self) -> None:
        self.processors: list[SpanProcessor] = []
        self.sampler = Sampler(rate=1.0)
        self.service_name: str = "memorylens"

    def add_processor(self, processor: SpanProcessor) -> None: ...
    def get_tracer(self, name: str) -> Tracer: ...
    def shutdown(self) -> None: ...

    @classmethod
    def get(cls) -> TracerProvider: ...  # singleton accessor

    @classmethod
    def reset(cls) -> None: ...  # for testing — shuts down and clears instance

The Tracer is created per-module (by name) via provider.get_tracer(name). It provides a context manager for creating spans:

class Tracer:
    @contextmanager
    def start_span(
        self,
        operation: MemoryOperation,
        parent_span_id: str | None = None,
        attributes: dict[str, Any] | None = None,
    ) -> Generator[_MutableSpan, None, None]:
        # 1. Check sampler — if not sampled, yield a no-op span
        # 2. Read current MemoryContext for agent/session/user IDs
        # 3. Create _MutableSpan with new trace_id and span_id (UUID hex)
        # 4. Call on_start() on all processors
        # 5. yield span — caller can set attributes, content, status
        # 6. On exception: set status=ERROR, capture error.type/error.message
        # 7. On exit: finalize() → frozen MemorySpan, call on_end() on all processors

The internal _MutableSpan builder accumulates state during the span lifecycle. It exposes set_attribute(), set_status(), and set_content() methods. finalize() stamps end_time, computes duration_ms, and returns an immutable MemorySpan.

Decorators

The four instrument_* decorators are the primary instrumentation API. They wrap existing functions without requiring any modification to the function body.

# src/memorylens/_core/decorators.py

@instrument_write(backend="mem0", capture_content=True)
def store_memory(user_id: str, content: str) -> bool: ...

@instrument_read(backend="mem0", top_k=5)
def search_memories(query: str) -> list[Memory]: ...

@instrument_compress(model="gpt-4o-mini")
def summarize(memories: list[str]) -> str: ...

@instrument_update(backend="mem0")
def update_memory(memory_id: str, new_content: str) -> bool: ...

All four decorators are implemented via a shared _make_decorator(operation, **kwargs) factory. The wrapper function:

  1. Gets the TracerProvider singleton and acquires a tracer for the wrapped function's module
  2. Opens a span with the correct MemoryOperation and any static keyword arguments as initial attributes
  3. Conditionally captures repr(args/kwargs) as input_content (if capture_content=True)
  4. Calls the original function
  5. Conditionally captures repr(result) as output_content
  6. Returns the result — exceptions propagate naturally after setting status=ERROR

Content capture resolution order: explicit decorator kwarg → MEMORYLENS_CAPTURE_CONTENT env var → default True.

Context Propagation

The MemoryContext is a frozen dataclass that acts as a Python context manager. It uses ContextVar (from the standard library contextvars module) to propagate agent/session/user metadata through the call stack without modifying function signatures. It is fully async-safe because ContextVar isolates state per asyncio Task.

# src/memorylens/_core/context.py
@dataclass(frozen=True, slots=True)
class MemoryContext:
    agent_id: str | None = None
    session_id: str | None = None
    user_id: str | None = None

    def __enter__(self) -> Self:
        # sets the ContextVar token
        object.__setattr__(self, "_token", _current_context.set(self))
        return self

    def __exit__(self, *exc) -> None:
        # resets to the previous context (safe nesting)
        _current_context.reset(self._token)

# Usage:
with memorylens.context(agent_id="support-bot", session_id="sess-123"):
    store_memory("user prefers vegetarian meals")
    # → span will have agent_id="support-bot", session_id="sess-123"

The public memorylens.context() function wraps MemoryContext constructor. The Tracer.start_span() calls get_current_context() to read the active context at span creation time.

SpanProcessor Pipeline

Span processors sit between the tracer and the exporters. They receive spans via on_start() (before execution) and on_end() (after execution). Multiple processors can be registered; they are called in registration order.

class SpanProcessor(Protocol):
    def on_start(self, span: MemorySpan) -> None: ...
    def on_end(self, span: MemorySpan) -> None: ...
    def shutdown(self) -> None: ...
    def force_flush(self, timeout_ms: int = 30000) -> bool: ...

SimpleSpanProcessor

Synchronous: calls exporter.export([span]) immediately on on_end(). Used for debugging, testing, and scenarios where latency is not a concern.

BatchSpanProcessor

The production processor. Appends spans to a bounded deque (max 2048) under a lock. A daemon thread wakes every 5 seconds (or when the queue reaches 512 spans, or on force_flush()) and exports in batches. This keeps the hot path non-blocking.

class BatchSpanProcessor:
    def __init__(
        self,
        exporter: SpanExporter,
        max_batch_size: int = 512,       # trigger flush when queue hits this
        schedule_delay_ms: int = 5000,   # background flush interval
        max_queue_size: int = 2048,      # drops oldest spans if exceeded
    ) -> None: ...

On shutdown(), the processor signals the background thread, waits up to 10 seconds, flushes any remaining spans, then shuts down the exporter.

Sampler

The Sampler is a simple rate-based sampler. It generates a random float on each call to should_sample() and returns True if it is below the configured rate. Rate is set via memorylens.init(sample_rate=0.1) or MEMORYLENS_SAMPLE_RATE env var. Default is 1.0 (sample everything). When a span is not sampled, the tracer yields a no-op _MutableSpan with empty IDs and skips all processor calls.

Exporters — memorylens._exporters

Exporters are the output stage of the pipeline. They receive batches of MemorySpan objects from the processor and persist them to a backend. Three exporters ship with MemoryLens core.

SpanExporter Protocol

class SpanExporter(Protocol):
    def export(self, spans: list[MemorySpan]) -> ExportResult: ...
    def shutdown(self) -> None: ...

class ExportResult(Enum):
    SUCCESS = "success"
    FAILURE = "failure"

SQLite Exporter

The primary local storage backend. Writes spans to a SQLite database at ~/.memorylens/traces.db (configurable). Designed for developer use: zero server setup, persistent across sessions, queryable with standard SQL.

Database Schema

-- spans table (created on first use)
CREATE TABLE IF NOT EXISTS spans (
    span_id        TEXT PRIMARY KEY,
    trace_id       TEXT NOT NULL,
    parent_span_id TEXT,
    operation      TEXT NOT NULL,        -- "memory.write" etc.
    status         TEXT NOT NULL,        -- "ok" / "error" / "dropped"
    start_time     REAL NOT NULL,        -- epoch nanoseconds
    end_time       REAL NOT NULL,
    duration_ms    REAL NOT NULL,
    agent_id       TEXT,
    session_id     TEXT,
    user_id        TEXT,
    input_content  TEXT,
    output_content TEXT,
    attributes     TEXT NOT NULL DEFAULT '{}'  -- JSON blob
)

-- Indexes for common query patterns
CREATE INDEX IF NOT EXISTS idx_spans_trace_id   ON spans (trace_id)
CREATE INDEX IF NOT EXISTS idx_spans_session_id ON spans (session_id)
CREATE INDEX IF NOT EXISTS idx_spans_operation  ON spans (operation)
CREATE INDEX IF NOT EXISTS idx_spans_start_time ON spans (start_time)

The connection is opened in WAL mode (PRAGMA journal_mode=WAL) for concurrent read safety. check_same_thread=False is set because the BatchSpanProcessor writes from a background thread.

Query Methods

MethodParametersReturnsUsed By
query() trace_id, operation, status, agent_id, session_id, limit list[dict] CLI commands
query_extended() Same as above + q (full-text), offset tuple[list[dict], int] (rows + total count) Web UI (pagination + search)
save_audit() audit: CompressionAudit None Compression Auditor
get_audit() span_id: str dict | None Compression UI view
list_audits() limit, offset tuple[list[dict], int] CLI audit list
update_span_attributes() span_id, new_attrs: dict None Cost enricher (merges cost_usd into attributes)

Compression Audits Table

Created lazily on the first save_audit() call (not at initialization):

CREATE TABLE IF NOT EXISTS compression_audits (
    span_id             TEXT PRIMARY KEY,
    semantic_loss_score REAL NOT NULL,
    compression_ratio   REAL NOT NULL,
    pre_sentence_count  INTEGER NOT NULL,
    post_sentence_count INTEGER NOT NULL,
    sentences           TEXT NOT NULL,    -- JSON array of SentenceAnalysis dicts
    scorer_backend      TEXT NOT NULL,    -- "local" or "openai"
    created_at          REAL NOT NULL     -- epoch timestamp
)

OTLP Exporter

Translates MemorySpan objects to OpenTelemetry spans using a _ReadableSpanAdapter that presents the MemoryLens schema as an OTel-compatible interface. This allows using the upstream OTLPSpanExporter from the OpenTelemetry SDK directly.

_ReadableSpanAdapter

The adapter presents MemorySpan fields as OTel span properties:

OTel PropertySourceNotes
namespan.operation.valuee.g., "memory.write"
context.trace_idint(span.trace_id[:32], 16)128-bit int from hex string
context.span_idint(span.span_id[:16], 16)64-bit int from hex string
start_timeint(span.start_time)nanoseconds
end_timeint(span.end_time)nanoseconds
attributesAll span fields prefixed with memorylens.Nested dicts serialized as JSON strings
statusStatusCode.ERROR if error, else StatusCode.OK
kindSpanKind.INTERNAL
resourceResource.create({"service.name": "memorylens"})

The OTLP exporter is configured via standard OTel environment variables: OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, and OTEL_EXPORTER_OTLP_PROTOCOL. The default endpoint is http://localhost:4317.

JSONL Exporter

The simplest exporter — writes one JSON object per line to stdout or a file path. Stateless. Used by memorylens traces export and as a portable format for piping to other tools. Each line is the output of span.to_dict().

Exporter Registry and Factory

# src/memorylens/_exporters/__init__.py
def create_exporter(name: str, **kwargs) -> SpanExporter:
    match name:
        case "sqlite": return SQLiteExporter(**kwargs)
        case "otlp":   return OTLPExporter(**kwargs)
        case "jsonl":  return JSONLExporter(**kwargs)
        case _: raise ValueError(f"Unknown exporter: {name}")

Framework Integrations — memorylens.integrations

Auto-instrumentation patches framework memory classes at runtime so that zero code changes are required in user applications. Each integration follows an identical Instrumentor protocol.

Instrumentor Protocol

class Instrumentor(Protocol):
    def instrument(self, **kwargs) -> None: ...
    def uninstrument(self) -> None: ...

Each instrumentor stores the original method references so uninstrument() can restore the original behavior. The integration registry maps names to instrumentor classes:

# memorylens.init(instrument=["langchain", "mem0"]) resolves via:
def create_instrumentor(name: str) -> Instrumentor:
    return _REGISTRY[name]()  # raises helpful error if framework not installed
LangChain memorylens[langchain]
Patches: BaseMemory (langchain-core)
MethodOperation
save_context()WRITE
load_memory_variables()READ
ConversationSummaryMemory summarizeCOMPRESS
Mem0 memorylens[mem0]
Patches: Memory class (mem0ai)
MethodOperation
add()WRITE
search()READ (captures scores)
update()UPDATE
delete()WRITE status=DROPPED
LlamaIndex memorylens[llamaindex]
Patches: ChatMemoryBuffer (llama-index-core)
MethodOperation
put(message)WRITE
put_messages(messages)WRITE
get(input)READ
get_all()READ
reset()WRITE drop_reason="reset"
Letta memorylens[letta]
Patches: Letta client agents.blocks (letta-client)
MethodOperation
agents.blocks.retrieve()READ
agents.blocks.update()UPDATE
agents.blocks.delete()WRITE drop_reason="explicit_delete"
agents.blocks.list()READ
Zep memorylens[zep]
Patches: Zep client memory methods (zep-python)
MethodOperation
memory.add(session_id, messages)WRITE
memory.get(session_id)READ
memory.search(session_id, payload)READ (captures scores)
memory.delete(session_id)WRITE drop_reason="session_delete"

Auto-Instrumentation Shorthand

# Initialize all integrations in one call:
memorylens.init(instrument=["langchain", "mem0", "llamaindex"])

# Or initialize individual instrumentors explicitly:
from memorylens.integrations.langchain import LangChainInstrumentor
LangChainInstrumentor().instrument()

# To stop instrumentation:
from memorylens.integrations.mem0 import Mem0Instrumentor
inst = Mem0Instrumentor()
inst.instrument()
# ... later ...
inst.uninstrument()

If a framework is not installed, a clear error is raised: "LangChain not found. Install with: pip install memorylens[langchain]".

Web Dashboard — memorylens._ui

The MemoryLens web dashboard is a local developer tool that provides a browser-based interface for trace inspection. It launches with a single CLI command and reads from the same SQLite database that the SDK writes to. It is available as an optional extra: pip install memorylens[ui].

Technology Stack

App Factory

# src/memorylens/_ui/server.py
def create_app(db_path: str = _DEFAULT_DB, ingest: bool = False) -> FastAPI:
    app = FastAPI(title="MemoryLens", docs_url=None, redoc_url=None)
    # Mount static files at /static
    # Configure Jinja2 templates
    # Create SQLiteExporter(db_path) — shared via app.state
    # Register: create_trace_routes(app)
    # Register: create_compression_routes(app)
    if ingest:
        # Register: create_ingest_routes(app)  — OTLP HTTP receiver
    return app

def run(db_path, port=8000, ingest=False) -> None:
    app = create_app(db_path, ingest)
    uvicorn.run(app, host="127.0.0.1", port=port, log_level="warning")

Content Negotiation

All filterable endpoints serve both full HTML pages and htmx partials from the same URL. htmx automatically sends an HX-Request: true header on all requests it triggers. The routes check for this header (or use separate /api/ paths for partials) and return the appropriate template.

API Endpoints

MethodPathTemplate / ReturnsPurpose
GET/Redirect to /tracesRoot redirect
GET/tracestraces_list.htmlFull trace list page
GET/api/tracespartials/trace_table.htmlhtmx-filtered table rows
GET/traces/{trace_id}traces_detail.htmlTrace detail with timeline
GET/traces/{trace_id}/retrievalretrieval_debug.htmlRetrieval debugger (READ spans only)
GET/traces/{trace_id}/compressioncompression_audit.htmlCompression audit (COMPRESS spans only)
POST/api/traces/{trace_id}/auditRedirectRun compression audit, redirect back
POST/v1/tracesJSON {}OTLP HTTP/JSON ingest (optional)

Trace List View (/traces)

The main landing page. A filterable, paginated table of all memory operation traces.

Trace Detail View (/traces/{trace_id})

Deep-dive into a single span. Two-column layout:

Retrieval Debugger (/traces/{trace_id}/retrieval)

The flagship debugging view for READ operations. Visualizes exactly why a retrieval returned or missed specific memories:

OTLP Ingest Endpoint (POST /v1/traces)

When started with --ingest, the dashboard doubles as a lightweight OTLP HTTP/JSON collector. This allows agent applications to point their OTEL_EXPORTER_OTLP_ENDPOINT directly at the dashboard:

# Agent configuration for live ingest:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:8000
export OTEL_EXPORTER_OTLP_PROTOCOL=http/json

The ingest handler parses ExportTraceServiceRequest JSON, iterates resourceSpans[].scopeSpans[].spans[], skips spans without a memorylens.operation attribute, maps OTel attributes back to MemorySpan fields, and writes to SQLite. The UI then reflects new traces within the live poll interval.

HTTP/JSON only (no gRPC) — gRPC would require protobuf compilation on the server side, adding unnecessary complexity for a local dev tool.

File Structure

src/memorylens/_ui/
├── __init__.py
├── server.py # FastAPI app factory + uvicorn launcher
├── api/
│   ├── traces.py # page routes + filtered API endpoints
│   ├── compression.py # compression audit routes
│   └── ingest.py # OTLP HTTP/JSON receiver
├── templates/
│   ├── base.html # layout: nav, Tailwind CDN, htmx script
│   ├── traces_list.html # trace list page
│   ├── traces_detail.html # single trace timeline view
│   ├── retrieval_debug.html
│   ├── compression_audit.html
│   └── partials/
│       ├── trace_table.html # htmx partial: filterable rows
│       ├── span_timeline.html
│       ├── score_chart.html
│       └── sentence_diff.html # compression audit partial
└── static/
    └── app.css # custom styles (on top of Tailwind)

Compression Auditor — memorylens._audit

The Compression Auditor performs offline semantic analysis of COMPRESS spans, determining what information was preserved and what was lost during memory summarization. The SDK already captures input_content (pre-compression) and output_content (post-compression) on COMPRESS spans — this feature uses those fields to compute sentence-level semantic similarity.

Design Principle: Zero Runtime Overhead

Analysis runs as a post-hoc CLI command or UI action, never in the instrumentation hot path. This maintains the <2ms p99 overhead guarantee.

Data Model

@dataclass(frozen=True)
class SentenceAnalysis:
    text: str                # the original pre-compression sentence
    best_match_score: float  # max cosine similarity to any post-content sentence
    status: str              # "preserved" (>= 0.7) or "lost" (< 0.7)

@dataclass(frozen=True)
class CompressionAudit:
    span_id: str
    semantic_loss_score: float   # 1.0 - mean(best_match_scores). 0=no loss, 1=total loss
    compression_ratio: float     # len(post_content) / len(pre_content)
    pre_sentence_count: int
    post_sentence_count: int
    sentences: list[SentenceAnalysis]
    scorer_backend: str         # "local" or "openai"

Scorer Backends

The ScorerBackend protocol requires a single method: embed(texts: list[str]) -> list[list[float]]. Three implementations are provided:

ScorerDependencyModelAPI KeyUse Case
LocalScorer sentence-transformers all-MiniLM-L6-v2 (~80MB) None Default; works fully offline
OpenAIScorer openai (user-installed) text-embedding-3-small OPENAI_API_KEY Higher quality embeddings, cloud-based
MockScorer None (stdlib only) MD5 hash-based deterministic embeddings None Fast, deterministic testing
# Factory function for creating scorers:
def create_scorer(name: str) -> ScorerBackend:
    match name:
        case "local":  return LocalScorer()
        case "openai": return OpenAIScorer()
        case "mock":   return MockScorer()

Compression Analysis Algorithm

The CompressionAnalyzer.analyze(span_id, pre_content, post_content) method executes the following steps:

  1. Sentence splitting: split both pre_content and post_content into sentences using a regex-based splitter (splits on .!? followed by whitespace or end-of-string, with abbreviation handling)
  2. Batch embedding: concatenate all pre and post sentences into a single list and embed in one batch call — minimizing API calls for the OpenAI backend
  3. Similarity matrix: for each pre-sentence, compute cosine similarity against every post-sentence using the formula: dot(a, b) / (||a|| × ||b||)
  4. Best match scoring: take the maximum similarity score across all post-sentences as the best_match_score for each pre-sentence
  5. Classification: score ≥ 0.7 → "preserved"; score < 0.7 → "lost"
  6. Loss score: semantic_loss_score = 1.0 − mean(best_match_scores) — clamped to [0, 1]
  7. Compression ratio: len(post_content) / len(pre_content) (character-level)

Loss Score Classification

Score RangeClassificationVisual
0.0 – 0.29Low lossGreen, checkmark ✓
0.30 – 0.60Moderate lossAmber, warning ⚠
0.61 – 1.0High lossRed, X ✗

Compression Audit UI View (/traces/{trace_id}/compression)

Cost Attribution — memorylens._cost

Cost attribution tracks token counts and dollar costs per memory operation. Like the Compression Auditor, it is implemented as offline enrichment — token data is captured as span attributes during instrumentation, and a CLI command computes costs using a configurable pricing model and writes cost_usd back into the span's attributes via update_span_attributes().

No New Dependencies

Cost attribution uses only Python stdlib (json, pathlib). There are no new optional extras for this feature.

Capturing Token Data

Token data is added as span attributes during instrumentation, either via decorator kwargs or manually:

# Via decorator:
@instrument_write(backend="mem0", tokens_in=150, tokens_out=0, model="gpt-4o-mini")
def store(...): ...

# Via manual span creation:
with tracer.start_span(MemoryOperation.COMPRESS) as span:
    span.set_attribute("tokens_in", 1200)
    span.set_attribute("tokens_out", 300)
    span.set_attribute("model", "gpt-4o-mini")

Default Pricing Table

ModelInput ($/token)Output ($/token)
gpt-4o$0.0000025$0.00001
gpt-4o-mini$0.00000015$0.0000006
gpt-4-turbo$0.00001$0.00003
claude-3-opus$0.000015$0.000075
claude-3-sonnet$0.000003$0.000015
claude-3-haiku$0.00000025$0.00000125
text-embedding-3-small$0.00000002$0.0
text-embedding-3-large$0.00000013$0.0

User Pricing Override

Users override or extend pricing via ~/.memorylens/pricing.json:

{
    "my-fine-tuned-model": {"input": 0.00001, "output": 0.00003}
}

The load_pricing() function merges user entries over defaults. User entries take precedence.

Cost Enricher

class CostEnricher:
    def enrich_span(self, attrs: dict[str, Any]) -> dict | None:
        # Returns None if no token data present
        # Returns {"cost_usd": 0.0, "_cost_warning": "..."} for unknown models
        # Returns {"cost_usd": round(cost, 10)} on success
        cost = tokens_in * pricing[model]["input"] + tokens_out * pricing[model]["output"]

update_span_attributes Mechanism

Cost enrichment writes cost_usd directly into the span's attributes JSON field in SQLite, merging without overwriting other attributes:

# SQLiteExporter method:
def update_span_attributes(self, span_id: str, new_attrs: dict) -> None:
    current = json.loads(row["attributes"])
    current.update(new_attrs)  # merge
    # UPDATE spans SET attributes = ? WHERE span_id = ?

UI Integration

No new pages are added. Cost data surfaces in existing views:

CLI Reference — memorylens

The CLI is built with Typer and Rich. All commands read from the local SQLite database by default. Most commands support --json for machine-readable output.

memorylens init

Command
$ memorylens init
# Creates ~/.memorylens/ directory
# Output: "Initialized MemoryLens at /Users/you/.memorylens"

memorylens ui

$ memorylens ui
$ memorylens ui --port 8080
$ memorylens ui --db-path ./my-traces.db
$ memorylens ui --ingest # also accept OTLP HTTP at /v1/traces

Requires memorylens[ui]. Launches web dashboard at http://127.0.0.1:8000.

memorylens traces

# List traces
$ memorylens traces list
$ memorylens traces list --operation memory.write
$ memorylens traces list --status error
$ memorylens traces list --last 1h
$ memorylens traces list --agent-id support-bot
$ memorylens traces list --session-id sess-123

# Inspect a single trace
$ memorylens traces show <trace-id>

# Live tail (streaming)
$ memorylens traces tail
$ memorylens traces tail --operation memory.read
$ memorylens traces tail --min-duration 100ms

# Export as JSONL
$ memorylens traces export
$ memorylens traces export --last 24h -o traces.jsonl

memorylens stats

$ memorylens stats
$ memorylens stats --last 7d
$ memorylens stats --group-by operation

Shows aggregate statistics: total spans, error rate, p50/p95/p99 duration, breakdown by operation and status.

memorylens config

$ memorylens config show
$ memorylens config set <key> <value>

memorylens audit

# Audit all unaudited COMPRESS spans
$ memorylens audit compress
$ memorylens audit compress --trace-id abc123
$ memorylens audit compress --scorer openai
$ memorylens audit compress --force # re-audit already-audited spans

# Show detailed audit for a specific span
$ memorylens audit show <span_id>

# List all audit results
$ memorylens audit list
$ memorylens audit list --min-loss 0.3

memorylens cost

# Enrich spans with cost_usd
$ memorylens cost enrich
$ memorylens cost enrich --trace-id abc123
$ memorylens cost enrich --force # recalculate all

# Cost report
$ memorylens cost report
$ memorylens cost report --group-by agent_id
$ memorylens cost report --group-by session_id
$ memorylens cost report --group-by operation

# Manage pricing
$ memorylens cost pricing
$ memorylens cost pricing --set gpt-4o-mini.input=0.0000002

Cost Report Output Format

Cost Report (grouped by operation)

OPERATION         SPANS   TOKENS IN   TOKENS OUT   TOTAL COST
memory.write        45      12,300        0          $0.0018
memory.read         38       8,400      2,100        $0.0031
memory.compress     12       6,800      1,200        $0.0015

Total: 95 spans, 27,500 tokens in, 3,300 tokens out, $0.0064

Audit Output Format

Analyzing 12 COMPRESS spans...
[████████████████████████████████] 12/12

Results:
SPAN ID      LOSS SCORE   RATIO   PRESERVED   LOST    STATUS
a1b2c3d4     0.12         0.35    8/9         1/9     ✓ low loss
e5f6g7h8     0.45         0.28    4/7         3/7     ⚠ moderate loss
i9j0k1l2     0.78         0.15    2/8         6/8     ✗ high loss

Summary: 12 spans audited. 2 with moderate loss, 1 with high loss.

Public API Reference

MemoryLens exposes exactly 8 public symbols from the top-level memorylens package. Everything else is considered internal (prefixed with _) and may change without notice.

from memorylens import (
    init, shutdown,
    instrument_write, instrument_read, instrument_compress, instrument_update,
    context, get_tracer,
)
init() function

Initialize MemoryLens. Call once at application startup. Configures the TracerProvider, builds exporters, and optionally triggers auto-instrumentation.

def init( service_name: str | None = None, exporter: str | None = None, # "sqlite" | "otlp" | "jsonl" exporters: list[str] | None = None, # multiple exporters otlp_endpoint: str | None = None, instrument: list[str] | None = None, # ["langchain", "mem0", ...] capture_content: bool | None = None, sample_rate: float | None = None, # 0.0–1.0 db_path: str | None = None, # override SQLite path ) -> None

With no arguments, defaults to SQLite storage at ~/.memorylens/traces.db. Environment variables override all kwargs.

shutdown() function

Flush all pending spans in the processor queue and shut down all registered processors and exporters. Call at application exit to ensure no spans are lost.

def shutdown() -> None
instrument_write decorator

Decorator that traces memory write operations. Creates a MemoryOperation.WRITE span around the decorated function.

def instrument_write( backend: str | None = None, capture_content: bool | None = None, **kwargs: Any, # any key=value becomes a span attribute ) -> Callable[[F], F]
@instrument_write(backend="mem0", capture_content=True)
def store_user_preference(user_id: str, content: str) -> bool: ...
instrument_read decorator

Decorator that traces memory read/retrieval operations. Creates a MemoryOperation.READ span.

def instrument_read( backend: str | None = None, capture_content: bool | None = None, **kwargs: Any, # top_k, threshold, etc. ) -> Callable[[F], F]
@instrument_read(backend="mem0", top_k=5, threshold=0.7)
def search_memories(query: str) -> list[Memory]: ...
instrument_compress decorator

Decorator that traces memory compression/summarization operations. Creates a MemoryOperation.COMPRESS span. The input_content and output_content of COMPRESS spans are used by the Compression Auditor.

def instrument_compress( model: str | None = None, capture_content: bool | None = None, **kwargs: Any, ) -> Callable[[F], F]
@instrument_compress(model="gpt-4o-mini")
def summarize_conversation(messages: list[str]) -> str: ...
instrument_update decorator

Decorator that traces memory update operations (merge, replace, or append to an existing memory entry). Creates a MemoryOperation.UPDATE span.

def instrument_update( backend: str | None = None, capture_content: bool | None = None, **kwargs: Any, # update_type, memory_key, etc. ) -> Callable[[F], F]
context() context manager factory

Creates a MemoryContext context manager that attaches agent/session/user metadata to all spans created within the with block. Uses ContextVar — safe to nest and async-safe.

def context( agent_id: str | None = None, session_id: str | None = None, user_id: str | None = None, ) -> MemoryContext
with memorylens.context(agent_id="bot", session_id="sess-1", user_id="u-42"):
    store_memory(content)
    results = search_memories(query)
# All spans above have agent_id, session_id, user_id set
get_tracer() function

Escape hatch for manual span creation when the decorators are not sufficient. Returns a Tracer instance from the global TracerProvider.

def get_tracer(name: str) -> Tracer
tracer = memorylens.get_tracer("my.module")
with tracer.start_span(MemoryOperation.WRITE) as span:
    span.set_attribute("memory_key", "user_prefs")
    span.set_content(input_content=content)
    result = do_write(content)
    span.set_content(output_content=str(result))

Environment Variable Overrides

VariableEffectExample
MEMORYLENS_EXPORTER Override exporter selection; takes precedence over init() kwargs sqlite, otlp, jsonl
MEMORYLENS_CAPTURE_CONTENT PII safety toggle — disable content capture in production false
MEMORYLENS_SAMPLE_RATE Sampling rate; reduces storage and overhead 0.1 (10%)
OTEL_EXPORTER_OTLP_ENDPOINT OTLP collector URL http://localhost:4317
OTEL_EXPORTER_OTLP_HEADERS Auth headers for OTLP collector x-api-key=abc123
OTEL_EXPORTER_OTLP_PROTOCOL OTLP protocol selection http/json
OTEL_SERVICE_NAME Service name in OTLP traces my-agent

Configuration

pyproject.toml

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "memorylens"
version = "0.1.0"
description = "Observability and debugging for AI agent memory systems"
requires-python = ">=3.10"
license = "Apache-2.0"

# Core dependencies (always installed)
dependencies = [
    "opentelemetry-api>=1.20",
    "opentelemetry-sdk>=1.20",
    "opentelemetry-exporter-otlp-proto-grpc>=1.20",
    "opentelemetry-exporter-otlp-proto-http>=1.20",
    "typer>=0.9",
    "rich>=13.0",
]

# Optional extras
[project.optional-dependencies]
langchain  = ["langchain-core>=0.1"]
mem0       = ["mem0ai>=0.1"]
llamaindex = ["llama-index-core>=0.10"]
letta      = ["letta-client>=0.1"]
zep        = ["zep-python>=2.0"]
ui         = ["fastapi>=0.110", "uvicorn[standard]>=0.29", "jinja2>=3.1"]
audit      = ["sentence-transformers>=2.0", "numpy>=1.24"]
dev        = ["pytest>=8.0", "pytest-asyncio>=0.23", "ruff>=0.4",
              "mypy>=1.10", "httpx>=0.27", "numpy>=1.24"]

[project.scripts]
memorylens = "memorylens.cli.main:app"

# Tooling configuration
[tool.hatch.build.targets.wheel]
packages = ["src/memorylens"]

[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["src"]

[tool.ruff]
target-version = "py310"
line-length = 100

[tool.mypy]
python_version = "3.10"
strict = true

Optional Extras Summary

ExtraInstall CommandEnablesKey Dependencies
[langchain]pip install memorylens[langchain]LangChain auto-instrumentationlangchain-core ≥ 0.1
[mem0]pip install memorylens[mem0]Mem0 auto-instrumentationmem0ai ≥ 0.1
[llamaindex]pip install memorylens[llamaindex]LlamaIndex auto-instrumentationllama-index-core ≥ 0.10
[letta]pip install memorylens[letta]Letta auto-instrumentationletta-client ≥ 0.1
[zep]pip install memorylens[zep]Zep auto-instrumentationzep-python ≥ 2.0
[ui]pip install memorylens[ui]Web dashboard (memorylens ui)fastapi, uvicorn, jinja2
[audit]pip install memorylens[audit]Compression Auditor (LocalScorer)sentence-transformers, numpy

Multiple extras can be combined: pip install memorylens[langchain,mem0,ui,audit]

Local Configuration Files

MemoryLens stores local configuration in ~/.memorylens/:

FilePurpose
~/.memorylens/traces.dbSQLite trace database (default location)
~/.memorylens/pricing.jsonUser-defined pricing overrides for cost attribution

Testing

MemoryLens has a comprehensive test suite with 174 tests across all layers of the system. Tests are organized to mirror the package structure.

Test Structure

tests/
├── conftest.py # shared fixtures (in-memory SQLite, mock exporters)
├── test_core/
│   ├── test_tracer.py # TracerProvider, Tracer, _MutableSpan, sampling
│   ├── test_decorators.py # all 4 instrument_* decorators, capture_content
│   ├── test_span.py # MemorySpan, to_dict(), enum serialization
│   ├── test_context.py # MemoryContext, nesting, async isolation
│   └── test_processor.py # BatchSpanProcessor, SimpleSpanProcessor, flush
├── test_exporters/
│   ├── test_otlp.py # ReadableSpanAdapter, OTLPExporter
│   ├── test_sqlite.py # CRUD, query, query_extended, audit methods
│   └── test_jsonl.py # JSONL output format
├── test_integrations/
│   ├── test_langchain.py # LangChain instrumentor with mock BaseMemory
│   ├── test_mem0.py # Mem0 instrumentor with mock Memory class
│   ├── test_llamaindex.py # LlamaIndex with mock ChatMemoryBuffer
│   ├── test_letta.py # Letta with mock client blocks
│   └── test_zep.py # Zep with mock memory client
├── test_ui/
│   ├── test_api_traces.py # trace list/detail/retrieval endpoints
│   ├── test_api_ingest.py # OTLP ingest endpoint
│   ├── test_api_compression.py # compression audit endpoints
│   ├── test_templates.py # Jinja2 template rendering
│   └── test_query_extended.py # SQLiteExporter extension
├── test_audit/
│   ├── test_splitter.py # sentence splitting edge cases
│   ├── test_analyzer.py # CompressionAnalyzer with MockScorer
│   ├── test_scorer.py # scorer backends (MockScorer fast, LocalScorer slow)
│   └── test_storage.py # save/get/list audits
├── test_cost/
│   ├── test_pricing.py # pricing load, merge, save
│   ├── test_enricher.py # cost computation, skip logic
│   └── test_storage.py # update_span_attributes
└── test_cli/
    ├── test_commands.py # traces, stats, config commands
    ├── test_audit_commands.py
    └── test_cost_commands.py

Testing Patterns

Framework Integration Mocks

Integration tests use lightweight fake framework classes rather than installing the full framework. The instrumentor's _get_*_class() function is monkeypatched to return the mock class:

# Example: testing LangChain integration without installing langchain
class FakeBaseMemory:
    def save_context(self, inputs, outputs): ...
    def load_memory_variables(self, inputs): return {}

def test_langchain_write_span(monkeypatch):
    monkeypatch.setattr("memorylens.integrations.langchain.instrumentor._get_base_memory",
                        lambda: FakeBaseMemory)
    inst = LangChainInstrumentor()
    inst.instrument()
    # verify spans emitted with correct operation type

In-Memory SQLite

Database tests use SQLite in-memory connections (db_path=":memory:") for isolation and speed. Each test gets a fresh database via pytest fixtures.

Slow Test Marker

Integration tests that load the real LocalScorer (requires downloading the ~80MB all-MiniLM-L6-v2 model) are marked with @pytest.mark.slow and excluded from the default test run:

# Run fast tests only (default CI):
uv run pytest tests/ -v

# Run slow tests only (model integration):
uv run pytest tests/ -v -m slow

MockScorer for Audit Tests

The MockScorer uses MD5 hashing to produce deterministic, fast embeddings. Texts that share more words get more similar vectors, making it useful for verifying the overall analysis pipeline without requiring a real embedding model.

UI Tests with httpx

FastAPI endpoint tests use httpx.AsyncClient with an in-process test client. htmx partial responses are tested by including the HX-Request: true header. No browser automation is used — the server-driven htmx approach means API response testing fully covers all interactive behaviors.

Running Tests

# Install dev dependencies
uv sync --extra dev

# Run all fast tests
uv run pytest tests/ -v

# Run with coverage
uv run pytest tests/ --cov=memorylens --cov-report=term-missing

# Lint
uv run ruff check src/ tests/

# Type check
uv run mypy src/memorylens/

Technology Stack

Python 3.10+
Core language. Uses modern type syntax, match/case, dataclasses with slots, and contextvars
OpenTelemetry
opentelemetry-api, opentelemetry-sdk, OTLP exporters for gRPC and HTTP
SQLite
Primary local storage via Python stdlib sqlite3. WAL mode, 4 indexes, lazy migration
Typer
CLI framework. Subcommand groups (traces, stats, config, audit, cost). Typer ≥ 0.9
Rich
Terminal output formatting. Tables, progress bars, syntax-highlighted output. Rich ≥ 13.0
FastAPI
Async web framework for the dashboard. App factory pattern, static files, Jinja2 integration. FastAPI ≥ 0.110
uvicorn
ASGI server. Binds to localhost only. Standard extras for performance. ≥ 0.29
Jinja2
Server-side HTML templating. Templates with partials for htmx responses. ≥ 3.1
htmx
Loaded via CDN. Enables dynamic UI updates (filtering, pagination, live tail) without writing JavaScript
TailwindCSS
Loaded via CDN. Utility-first styling with dark mode support. No build step required
sentence-transformers
Local embedding model (all-MiniLM-L6-v2) for the Compression Auditor. Optional [audit] extra
NumPy
Vector operations for sentence embeddings (via sentence-transformers). Optional [audit] extra
pytest
Test runner. pytest-asyncio for async FastAPI tests, httpx for HTTP client testing. 174 tests
Ruff
Linter and formatter. Targets Python 3.10, 100-char line length. Selects E, F, I, N, W, UP rules
mypy
Static type checker. Strict mode, warn_return_any, warn_unused_configs
hatchling
Build backend for packaging and wheel building. src/ layout
uv
Fast Python package manager and project tool. Used for dependency management and running tools
OpenAI (optional)
Optional embedding backend for Compression Auditor using text-embedding-3-small. User-installed

Project Statistics

174
Tests
47
Source Files
8
Public API Symbols
5
Framework Integrations
~3K
Lines, Source
~2.7K
Lines, Tests
7
Optional Extras
4
Memory Operations

Source File Breakdown

ModuleDescriptionKey Files
memorylensPublic API package__init__.py
memorylens._coreTracer, span, decorators, context, processors, sampler7 files
memorylens._exportersSQLite, OTLP, JSONL, base protocol, registry5 files
memorylens.integrations5 framework instrumentors + registry11 files
memorylens._uiFastAPI server, 3 route modules, templates, static7 Python files + templates
memorylens._auditCompression analyzer, scorer backends, splitter4 files
memorylens._costPricing model, cost enricher3 files
memorylens.cliCLI entry point + 5 command modules + formatters8 files

Phase Delivery Timeline

PhaseDateScopeStatus
Phase 1 2026-04-07 Core SDK: decorators, context, processors, SQLite/OTLP/JSONL exporters, LangChain + Mem0 integrations, CLI Complete
Phase 2a 2026-04-07 Web Dashboard: FastAPI server, Trace List, Trace Detail, Retrieval Debugger, OTLP ingest endpoint Complete
Phase 2b 2026-04-08 Compression Auditor: CompressionAnalyzer, LocalScorer/OpenAIScorer/MockScorer, CLI audit commands, UI view Complete
Phase 2c 2026-04-08 Cost Attribution: pricing model, CostEnricher, update_span_attributes, CLI cost commands, UI integration Complete
Phase 2d 2026-04-08 Additional Integrations: LlamaIndex, Letta, and Zep auto-instrumentation Complete

Complete Project File Structure

MemoryLens/
├── pyproject.toml # build, deps, extras, tools
├── uv.lock
├── LICENSE # Apache 2.0
├── README.md
├── src/
│   └── memorylens/
│       ├── __init__.py # 8 public symbols, init(), context()
│       ├── _core/
│       │   ├── tracer.py # TracerProvider, Tracer, _MutableSpan
│       │   ├── span.py # MemorySpan frozen dataclass
│       │   ├── decorators.py # instrument_write/read/compress/update
│       │   ├── schema.py # MemoryOperation, SpanStatus enums
│       │   ├── context.py # MemoryContext, ContextVar
│       │   ├── processor.py # SpanProcessor, SimpleSpanProcessor, BatchSpanProcessor
│       │   └── sampler.py # Sampler (rate-based)
│       ├── _exporters/
│       │   ├── base.py # SpanExporter protocol, ExportResult
│       │   ├── sqlite.py # SQLiteExporter with all query methods
│       │   ├── otlp.py # OTLPExporter, _ReadableSpanAdapter
│       │   ├── jsonl.py # JSONLExporter
│       │   └── __init__.py # create_exporter() factory
│       ├── integrations/
│       │   ├── __init__.py # Instrumentor protocol, registry, create_instrumentor()
│       │   ├── langchain/ # BaseMemory patcher
│       │   ├── mem0/ # Memory class patcher
│       │   ├── llamaindex/ # ChatMemoryBuffer patcher
│       │   ├── letta/ # Letta client blocks patcher
│       │   └── zep/ # Zep memory client patcher
│       ├── _audit/
│       │   ├── analyzer.py # CompressionAnalyzer, CompressionAudit, SentenceAnalysis
│       │   ├── scorer.py # ScorerBackend, MockScorer, LocalScorer, OpenAIScorer
│       │   └── splitter.py # split_sentences()
│       ├── _cost/
│       │   ├── pricing.py # DEFAULT_PRICING, load_pricing(), save_user_pricing()
│       │   └── enricher.py # CostEnricher
│       ├── _ui/ # FastAPI server, routes, templates
│       └── cli/
│           ├── main.py # Typer app, init, ui commands
│           ├── formatters.py # rich table + JSON output helpers
│           └── commands/
│               ├── traces.py # list, show, tail, export
│               ├── stats.py # summary statistics
│               ├── config.py # config show/set
│               ├── audit.py # audit compress/show/list
│               └── cost.py # cost enrich/report/pricing
└── tests/ # 174 tests, 39 test files