Metadata-Version: 2.4
Name: agentchaos-core
Version: 0.2.0
Summary: Chaos Engineering & Failure Diagnosis for AI Agents
Project-URL: Homepage, https://agentchaos.dev
Project-URL: Repository, https://github.com/jeffery0929/agentchaos
Project-URL: Documentation, https://agentchaos.dev
Project-URL: Issues, https://github.com/jeffery0929/agentchaos/issues
Author: AgentChaos contributors
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: agent,chaos-engineering,evaluation,llm,opentelemetry,reliability
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.11
Requires-Dist: jsonschema>=4.20
Requires-Dist: opentelemetry-api>=1.30
Requires-Dist: opentelemetry-sdk>=1.30
Requires-Dist: opentelemetry-semantic-conventions>=0.50b0
Requires-Dist: pydantic>=2.5
Requires-Dist: pytest>=8.0
Provides-Extra: all
Requires-Dist: anthropic>=0.30; extra == 'all'
Requires-Dist: langgraph>=0.2; extra == 'all'
Requires-Dist: mike>=2.1; extra == 'all'
Requires-Dist: mkdocs-material>=9.5; extra == 'all'
Requires-Dist: mkdocstrings[python]>=0.25; extra == 'all'
Requires-Dist: openai>=1.40; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.30; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest-xdist>=3.5; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Requires-Dist: types-jsonschema; extra == 'dev'
Requires-Dist: types-pyyaml; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mike>=2.1; extra == 'docs'
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.25; extra == 'docs'
Provides-Extra: langgraph
Requires-Dist: langgraph>=0.2; extra == 'langgraph'
Provides-Extra: openai
Requires-Dist: openai>=1.40; extra == 'openai'
Description-Content-Type: text/markdown

# AgentChaos

**Chaos testing and failure diagnosis for AI agents.**

AgentChaos is currently `0.2.0`: a Python toolkit for repeatedly running agent
tests, injecting realistic failures, collecting trace-like spans, detecting common
failure modes, ingesting framework-shaped traces, and exporting reliability reports.

The v0.1 track is intentionally small: pytest integration, two injectors, two
detectors, three metrics, JSON reports, and a CLI summary command. v0.2 is
focused on framework-neutral adapter boundaries, runtime ingestion prototypes,
trace-based semantic detectors, and release hygiene.

[![License: Apache 2.0](https://img.shields.io/badge/license-Apache_2.0-blue)](LICENSE)

## Status

Implemented today:

- `ChaosTracer` for agent, tool, and chat spans
- `ChaosRunner` for repeated callable execution
- `execute_chaos_test()` as the framework-neutral execution service
- Injectors: `ToolTimeout`, `ArgSchemaMutation`
- Detectors: `LoopDetector`, `ArgSchemaViolationDetector`, `ToolInvocationMismatchDetector`
- Metrics: `step_success_rate_at_k`, `run_variance`, `recovery_rate`
- JSON report exporter
- Pytest plugin: `@chaos`, `--chaos`, `--chaos-report`, `chaos_tracer`
- CLI: `agentchaos summarize <report.json>`
- No-API-key local demo
- v0.2 development: minimal LangGraph-like adapter prototype
- v0.2 development: LangGraph runtime `stream`/`astream_events` ingestion adapter with parent reconstruction
- v0.2 development: OpenAI Agents-like event model skeleton, without OpenAI SDK imports
- v0.2 development: CrewAI-like event model skeleton, without CrewAI SDK imports
- v0.2 development: MCP-like event model skeleton, without MCP SDK imports

Not implemented yet:

- HTML report
- CrewAI, OpenAI Agents, or MCP production runtime adapters
- Production-ready framework adapters beyond the current LangGraph runtime ingestion and skeleton prototypes
- Benchmark integrations such as tau-bench
- Production sampling or hosted dashboard

## v0.2 Roadmap

- Keep adapter prototypes framework-neutral by mapping runtime-like events into `TraceSpan` without importing framework SDKs.
- Expand trace-based semantic detectors around tool-use reliability while keeping detectors dependent only on internal spans.
- Harden release hygiene with public preflight checks, package verification commands, and stable JSON/CLI behavior.
- Defer production runtime adapters until the adapter boundary and skeleton tests are stable.

## Quickstart

Clone the repo and install it locally:

```bash
git clone https://github.com/jeffery0929/agentchaos.git
cd agentchaos
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
```

Run the no-API-key demo:

```bash
pytest examples/basic --chaos --chaos-report chaos_reports/basic.json -q
```

Summarize the report:

```bash
agentchaos summarize chaos_reports/basic.json
```

Expected result:

```text
AgentChaos report: pytest_suite
tests: 1
total runs: 3
successful runs: 3
failed runs: 0
matched detections: 3
```

## First Test

```python
from agent_chaos import chaos
from agent_chaos.injectors import ToolTimeout


@chaos(injectors=[ToolTimeout(p=0.2)], runs=10)
def test_agent_handles_tool_timeouts(chaos_tracer):
    with chaos_tracer.invoke_agent("flight-agent"):
        result = my_agent.run("Book a flight from SFO to NRT")

    assert result.status == "success"
```

Run it with:

```bash
pytest --chaos --chaos-report agentchaos-report.json
agentchaos summarize agentchaos-report.json
```

`@chaos` is lazy-loaded from the package root, so ordinary `import agent_chaos` does not
pull in pytest. Pytest is only needed when using the pytest plugin.

## Report Contents

The JSON report includes:

- total, successful, and failed run counts
- pass rate
- `step_success_rate_at_k`
- `run_variance`
- `recovery_rate`
- per-run detector results
- optional span payloads with `--chaos-include-spans`

## Why This Exists

Production agent failures are often not clean assertion failures. They show up as loops,
bad tool arguments, fabricated observations, retry storms, premature stops, and task
drift. AgentChaos focuses on a narrow v0.1 gap:

```text
fault injection + trace-backed failure classification + CI-friendly reports
```

See:

- [docs/motivation.md](docs/motivation.md)
- [docs/theory.md](docs/theory.md)
- [docs/getting-started.md](docs/getting-started.md)
- [docs/v0.1_acceptance_checklist.md](docs/v0.1_acceptance_checklist.md)

## Current v0.1 Scope

In scope:

- pytest-first local workflow
- deterministic local demo
- JSON report as the stable output
- CLI summary for report inspection
- small, testable core modules

Out of scope for v0.1:

- hosted UI
- SaaS dashboard
- HTML report unless the core stabilizes first
- framework-specific adapters
- public leaderboard

## Development Checks

```bash
pytest -q
ruff check agent_chaos tests examples
ruff format --check agent_chaos tests examples
mypy agent_chaos tests examples
pytest examples/basic --chaos --chaos-report chaos_reports/basic.json -q
agentchaos summarize chaos_reports/basic.json
```

## Optional Paid OpenAI Dogfood

After configuring a small API budget and adding `OPENAI_API_KEY` to ignored local env
files, run the manual paid smoke test:

```bash
python examples/openai_paid_dogfood/run_demo.py
agentchaos summarize chaos_reports/openai-paid-dogfood.json
```

The default model is `gpt-5.4-nano` to keep the first paid run cheap.

## License

Apache 2.0. See [LICENSE](LICENSE).
