Metadata-Version: 2.4
Name: agentdebugx
Version: 0.1.0
Summary: Portable error analysis, tracing, and recovery framework for agentic AI systems. Import as `agentdebug`.
License: MIT
License-File: LICENSE
Keywords: llm,agents,debugging,observability,failure-analysis,agent-debugging,agentic-ai,tracing,evaluation
Author: ULab @ UIUC
Author-email: ulab@illinois.edu
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Debuggers
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: System :: Monitoring
Classifier: Typing :: Typed
Provides-Extra: all
Provides-Extra: langgraph
Provides-Extra: otel
Provides-Extra: ui
Requires-Dist: httpx (>=0.24,<1.0)
Requires-Dist: pydantic (>=1.10,<3.0)
Project-URL: Documentation, https://github.com/ulab-uiuc/AgentDebugX/tree/main/docs
Project-URL: Homepage, https://github.com/ulab-uiuc/AgentDebugX
Project-URL: Repository, https://github.com/ulab-uiuc/AgentDebugX
Description-Content-Type: text/markdown

# AgentDebugX

AgentDebugX is an open-source framework for tracing, diagnosing, and recovering
from failures in agentic AI systems. It is packaged as `agentdebug` so existing
agent stacks can add structured debugging with a small import:

```python
from agentdebug import AgentDebug, EventType, SQLiteTraceStore

debugger = AgentDebug(store=SQLiteTraceStore(".agentdebug/errors.sqlite"))

with debugger.trace(goal="Book a refundable flight", framework="my-agent") as trace:
    trace.record(EventType.PLAN, agent_name="planner", output="Search flights first")
    trace.record(
        EventType.TOOL_RESULT,
        agent_name="browser",
        error="Timeout while loading checkout page",
        step_index=3,
    )
    report = trace.analyze()

print(report.summary)
```

## Vision

Most observability tools show what an agent did. AgentDebugX aims to explain why
it failed, who or which module caused the issue, when the first critical error
occurred, and what should be changed next.

The long-term scope is:

- A portable trajectory IR for single-agent, multi-agent, tool-use, UI, and
  multimodal runs.
- An agentic error database for storing traces, diagnostic reports, taxonomy
  labels, artifacts, and recovery outcomes.
- Automated failure localization across step, module, agent, and run levels.
- Automated taxonomy generation that clusters new failures and proposes new
  labels when existing taxonomies are insufficient.
- Recovery suggestions that can be used by humans, CI workflows, or self-healing
  agents.
- UI and notebook experiences for timeline debugging, attribution, replay, and
  dataset curation.

## Current Status

This repository now contains the first framework skeleton:

- `agentdebug.models`: normalized event, artifact, trajectory, taxonomy, and
  diagnostic report schemas.
- `agentdebug.recorder`: high-level `AgentDebug` and `TraceSession` APIs.
- `agentdebug.analyzers`: deterministic baseline analyzer for immediate local
  feedback.
- `agentdebug.taxonomy`: seed taxonomy inspired by AgentDebug, MAST, Who&When,
  AgentRx, AgentSight, and multimodal extensions.
- `agentdebug.storage`: JSONL and SQLite trace stores.
- `agentdebug.instrumentation`: lightweight function/tool tracing helper.
- `agentdebug.cli`: initial `agentdebug analyze` command.

Many advanced modules are intentionally not implemented yet. The design docs
spell out the proposed path before we lock into heavy abstractions.

## Install

```bash
# From PyPI (distribution name: agentdebugx; import as `agentdebug`)
pip install agentdebugx

# With the optional local dashboard
pip install 'agentdebugx[ui]'

# With LangChain/LangGraph callback adapter
pip install 'agentdebugx[langgraph]'

# With OpenTelemetry GenAI export shim
pip install 'agentdebugx[otel]'

# Everything
pip install 'agentdebugx[all]'
```

From source:

```bash
pip install -e .         # or: poetry install
```

## Quick Start

```python
from agentdebug import AgentDebug, EventType
from agentdebug.models import model_to_json

debugger = AgentDebug()
trajectory = debugger.start_trace(
    goal="Find a paper and summarize the method",
    framework="custom-react-agent",
)

debugger.record_event(
    trajectory,
    EventType.LLM_CALL,
    agent_name="researcher",
    module="planning",
    step_index=1,
    input="Find the latest AgentDebug paper",
)
debugger.record_event(
    trajectory,
    EventType.TOOL_RESULT,
    agent_name="search",
    module="action",
    step_index=2,
    error="JSON schema validation failed: missing parameter query",
)
debugger.finish_trace(trajectory, success=False)

report = debugger.analyze(trajectory)
print(model_to_json(report, indent=2))
```

CLI:

```bash
# Run the rule-based analyzer (works offline, no LLM)
agentdebug analyze examples/sample_trace.json --out .agentdebug/report.json

# List traces in a store
agentdebug list --store-sqlite .agentdebug/errors.sqlite

# Run the LLM judge (requires AGENTDEBUG_LLM_BASE_URL + AGENTDEBUG_LLM_API_KEY)
agentdebug judge examples/sample_trace.json --attribute

# Launch the local dashboard at http://127.0.0.1:7777
agentdebug serve --store-sqlite .agentdebug/errors.sqlite

# Diagnose which adapters / integrations are available
agentdebug doctor
```

End-to-end demo (judge + attribution + Reflexion recovery, all live against an
LLM): `examples/llm_judge_demo.py`.

## Benchmark

`scripts/eval_v0_1.py` runs the rule analyzer + LLM judge + All-at-Once
attributor across six realistic failing traces (action/format,
planning/loop, verification, system/tool error, multi-agent handoff loss,
memory retrieval). Latest results:
[docs/benchmarks/v0_1_smoke.md](docs/benchmarks/v0_1_smoke.md).

## Documentation

Narrative (paper-style):

- [Research Survey](docs/RESEARCH_SURVEY.md)
- [Open-Source Development Plan](docs/OPEN_SOURCE_DEVELOPMENT_PLAN.md)
- [Seed Error Taxonomy](docs/ERROR_TAXONOMY.md)

Engineering spec (numbered docs 00–18):

- [docs/README.md](docs/README.md) — full doc index
- [docs/00_overview.md](docs/00_overview.md) — vision + north-star UX
- [docs/02_architecture.md](docs/02_architecture.md) — 7-layer architecture
- [docs/18_comparison_codex_vs_design.md](docs/18_comparison_codex_vs_design.md) — Codex scaffold ↔ design-spec reconciliation that drove v0.1

## Development

```bash
pytest
mypy --strict ./
ruff check .
```

## License

[MIT License](LICENSE) — see the `LICENSE` file. Original template copyright
Haofei Yu, 2024; additions copyright ULab @ UIUC and contributors, 2026.

