Default failure capture
After installation, pytest failures emit a snapshot artifact without per-test wiring.
Research Software | Open Source | MIT
llmdebug records failure-time evidence as a local, inspectable artifact:
exception context, prioritized stack frames, and summarized local state that can be
reviewed from the CLI, notebooks, and MCP clients.
Problem
Tracebacks alone rarely preserve enough runtime state for reliable diagnosis.
Solution
Structured snapshots preserve the failing context before it disappears.
Context. LLM-assisted debugging often degrades when
the model only sees a traceback and partial surrounding code.
Method.
llmdebug captures a structured snapshot
at the exception boundary, prioritizes the crash frame, and summarizes local state for downstream inspection.
Output. The resulting artifact is available through
CLI, notebook, and MCP interfaces, with output-scope and redaction controls for different workflows.
Scope. The project improves evidence availability; it
does not certify diagnoses, patches, or root-cause correctness.
Shipped capabilities described in the same evidence-first vocabulary used throughout the README and protocol docs.
After installation, pytest failures emit a snapshot artifact without per-test wiring.
Machine-readable snapshots preserve exception context, frames, locals, and selected environment metadata.
The same evidence model can be inspected from terminal and notebook surfaces or integrated into production hook and MCP workflows.
Snapshot diffing supports run-to-run comparison during regression triage and debugging review.
A pattern-based engine ranks common failure mechanisms to support triage, not to prove root cause.
Redaction profiles and rate limiting help keep snapshot capture usable in production settings.
From failing execution to reviewable evidence in three steps.
Step 1
def test_transform():
result = transform(data)
assert result.shape == (100, 5)
Step 2
{
"exception": {
"type": "ValueError",
"message": "shape mismatch"
},
"crash_frame": {
"file_rel": "pipeline.py",
"line": 47
}
}
Step 3
$ llmdebug show
$ llmdebug show --detail context
# after a second failing run:
$ llmdebug diff #2 #1
Current-release capabilities and their primary documentation entrypoints.
| Capability | Status | Documentation |
|---|---|---|
| Pytest failures produce snapshots by default | Available | README: Quick Start |
CLI inspection (show, list, frames, diff, git-context) |
Available | README: CLI |
Detail levels (crash, full, context) for evidence size control |
Available | CLI Reference: Detail Levels |
| Production hooks with rate limiting and redaction controls | Available | Configuration: Production Hooks |
| MCP server with evidence tools and RCA state tools | Available | README: MCP Server |
Install the package, trigger a failure, then inspect the emitted artifact.
$ pip install 'llmdebug[cli]'
$ pytest
$ llmdebug show
Start with the maintained docs for package usage, contribution, and local/Docker evals instead of hunting through the repo tree.
Index
Canonical docs map, grouped into package usage, contribution, ops, and research notes.
Overview
Product overview, installation, quick start, and the top-level docs map.
Workflow
Local setup, quality gates, and PR expectations for contributors.
Reference
Capture entry points, environment variables, output formats, and API surface.
Reference
Command reference for show, list, frames, diff, export, and git-context.
Reference
MCP tool contracts, parameters, envelopes, and error handling.
Design
Layer model, capture pipeline, RCA workflow, and design decisions.
Workflow
Testing strategy, snapshot-first debugging workflow, and targeted validation guidance.
Support
Common setup, locking, plugin, MCP, and environment recovery steps.
Evals
Canonical start for running, validating, and interpreting evaluation workflows.
Ops
Local model endpoints, Docker wrapper runs, and official SWE-Bench scoring.
Research Notes
Dated findings, experiment plans, and roadmap documents remain under
docs/, but they are supporting material rather
than the primary package or operations documentation surface.
The evidence model stays the same; only the access surface changes.
# zero additional setup after installation
$ pip install 'llmdebug[cli]'
$ pytest
$ llmdebug
# failure artifact:
# .llmdebug/latest.json
from llmdebug import debug_snapshot
@debug_snapshot()
def run_job(payload: list[int]) -> list[int]:
return [transform(x) for x in payload]
from llmdebug import snapshot_section
with snapshot_section("feature_pipeline"):
features = build_features(raw_data)
%load_ext llmdebug
%llmdebug
%llmdebug list
%llmdebug diff
import llmdebug
llmdebug.install_hooks()
# captures:
# sys.excepthook
# threading.excepthook
# sys.unraisablehook
{
"mcpServers": {
"llmdebug": {
"command": "llmdebug-mcp"
}
}
}
Defaults are designed to balance diagnostic value, local usability, and operational caution.
Redaction policies can mask common secret-like strings before snapshots are written to disk.
Exception hooks apply rate limits to reduce runaway snapshot writes during repeated failures.
Snapshots are stored locally by default, which supports offline and air-gapped debugging workflows.
Snapshots improve evidence availability for diagnosis. Root-cause judgment remains yours.
Use the snapshot as high-signal evidence, not as exhaustive ground truth.
evals/ and should be interpreted separately from this overview page.
@software{schuler2026llmdebug,
author = {Schuler, Nicolas},
title = {llmdebug: Structured Debug Snapshots
for LLM-Assisted Debugging},
year = {2026},
url = {https://github.com/NicolasSchuler/llmdebug},
license = {MIT}
}