Metadata-Version: 2.4
Name: agentslow
Version: 0.8.1
Summary: Diagnose and auto-fix AI agent performance bottlenecks.
Author-email: Rico Allen <ricardojallen37@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/ricojallen37-sketch/agentslow
Project-URL: Repository, https://github.com/ricojallen37-sketch/agentslow
Project-URL: Issues, https://github.com/ricojallen37-sketch/agentslow/issues
Keywords: ai,agents,diagnostics,optimization,mcp,langgraph
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Provides-Extra: langgraph
Requires-Dist: langgraph>=0.2; extra == "langgraph"
Provides-Extra: mcp
Requires-Dist: mcp>=1.0; extra == "mcp"
Provides-Extra: all
Requires-Dist: langgraph>=0.2; extra == "all"
Requires-Dist: mcp>=1.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Dynamic: license-file

# agentslow

**AI agent diagnostics CLI — diagnose, benchmark, and auto-fix AI agent performance.**

Part of [A.I Shovels](https://github.com/ricojallen37-sketch) — tools that dig into AI infrastructure problems.

```
pip install agentslow
```

## What it does

One command tells you why your AI agent is slow, expensive, or unreliable — and fixes it.

```bash
agentslow diagnose trace.yaml
```

agentslow classifies your agent into one of four **performance regimes** (Context-Bound, Reasoning-Bound, Tool-Bound, IO-Bound), computes novel metrics like Token Efficiency Ratio and Tool Re-entry Rate, then prescribes auto-applicable fixes with machine-readable config patches.

## Actual CLI Output (not marketing — run it yourself)

```
$ agentslow diagnose examples/openclaw_research_agent.yaml --entropy --fidelity \
    --config examples/agent_config_baseline.yaml --dry-run

═══ agentslow v0.8.0 ═══
Agent: openclaw_research_v2 (langgraph)
Task: Research competitor pricing for SaaS product and compile report
Status: ✓ Success

── REGIME CLASSIFICATION ──
  Primary: CONTEXT-BOUND
  Confidence: 90%

── KEY METRICS ──
  Token Efficiency Ratio (TER): 0.0147
  Tool Re-entry Rate:           0.2500
  Time-to-First-Action:         1800ms
  Reasoning Ratio:              0.0983
  Total Cost:                   $0.5403
  Total Duration:               32850ms

── PRESCRIPTIVE FIXES ──
  1. [CRITICAL] [AUTO-FIX] Implement context compaction (summarization)
     Token Efficiency Ratio is 0.015 (healthy: >0.15). Your context is bloated
     with irrelevant tokens. Add a summarization step every N turns to compress
     conversation history.
     → Expected: 50-70% token reduction, major cost savings

  2. [HIGH] [AUTO-FIX] Enable prompt/prefix caching
     Total input tokens: 110,100. Enable prefix caching to avoid re-processing
     the same system prompt on every LLM call.
     → Expected: 30-50% latency reduction on repeated calls

  3. [MEDIUM] Tune RAG retrieval — retrieve less, retrieve better
     You're likely stuffing too many documents into context. Reduce top_k, add
     re-ranking, or switch to semantic chunking.
     → Expected: Fewer tokens = lower cost + faster inference

═══ CONTEXT ENTROPY ANALYSIS ═══
Session: openclaw_research_v2
Total Turns: 7

── ENTROPY METRICS ──
  Average Entropy:         0.9374
  Max Entropy:             1.0000
  Entropy Trend:           INCREASING
  Semantic Drift Ratio:    0.1034
  Noise Ratio:             0.2857
  Compaction Integrity:    1.0000

── VERDICT ──
  ✗ CRITICAL — Context is critical

═══ DRY-RUN: Implement context compaction (summarization) ═══
  context_compaction.enabled: false → true  [CAUTION]
  context_compaction.strategy: (none) → summarize_every_n  [CAUTION]
  context_compaction.n_turns: (none) → 5  [CAUTION]
  Rollback: agentslow rollback --fix-id context-001

═══ DRY-RUN: Enable prompt/prefix caching ═══
  enable_prompt_caching: false → true  [SAFE]
  Rollback: agentslow rollback --fix-id context-002

═══ GOLDEN SET FIDELITY TEST ═══
Tests Run: 15
Passed: 15 | Failed: 0
Overall Fidelity: 1.0000

── VERDICT ──
  ✓ PASSED — Safe to apply in production.
```

That's one command. Regime classification + metrics + fixes + entropy audit + dry-run diffs + fidelity verification (15 golden cases covering all 4 regimes).

## Benchmark: Before/After Proof

```
$ agentslow benchmark examples/agent_config_baseline.yaml --compare --tasks 5

═══ BENCHMARK COMPARISON ═══
Tasks: 5

── BEFORE (baseline) ──
  Tokens:  42,576 avg
  Cost:    $0.31 avg
  Success: 80%

── AFTER (optimized) ──
  Tokens:  23,218 avg  (-45.5%)
  Cost:    $0.22 avg   (-30.2%)
  Success: 80%

Fixes Applied: 3
MICRO-EVAL GUARD: ALL CLEAR
```

## P99 Jitter Audit

```
$ agentslow benchmark examples/agent_config_baseline.yaml --jitter-audit --jitter-runs 5

═══ P99 JITTER AUDIT ═══
Runs: 5

── STABILITY ──
  Overall: STABLE
  Worst Jitter: 1.26x (p99/p50)

Production-ready: variance is within acceptable bounds.
```

## CI/CD Integration

```bash
# Fails pipeline on CRITICAL entropy
agentslow diagnose trace.yaml --entropy --ci > junit_report.xml
echo $?  # exit code 1 on CRITICAL
```

JUnit XML output integrates directly with GitHub Actions, GitLab CI, Jenkins.

## Key Concepts

### Performance Regimes

| Regime | Symptom | Auto-Fix |
|--------|---------|----------|
| **Context-Bound** | Low TER (<0.15), bloated context | Context compaction, prompt caching |
| **Reasoning-Bound** | High reasoning ratio (>0.5) | Reasoning budget limits, task decomposition |
| **Tool-Bound** | High tool re-entry (>0.3) | Tool call batching, result caching |
| **IO-Bound** | High TTA (>2000ms) | Parallel tool execution, connection pooling |

### Novel Metrics

- **Token Efficiency Ratio (TER)**: `output_tokens / input_tokens`. Healthy: >0.15. Below that, you're paying for context the model ignores.
- **Tool Re-entry Rate**: Fraction of steps that re-call the same tool. High = your agent is retrying or looping.
- **Time-to-First-Action (TTA)**: Milliseconds from prompt to first tool call. Measures reasoning overhead.
- **Reasoning Ratio**: `reasoning_tokens / total_tokens`. How much compute goes to thinking vs. acting.

### Context Entropy Monitor

Measures context health over long-running sessions:
- Per-turn entropy scores (0-1)
- Noise ratio (garbage accumulation)
- Semantic drift via TER sliding windows
- Compaction integrity validation

### Safe Mode (Trust Architecture)

By default, agentslow shows fixes but doesn't write anything. The `--safe-mode` flag adds an extra preview layer; `--apply` is the explicit opt-in to write config patches.

```bash
# Preview only (default) — never writes files
agentslow diagnose trace.yaml --dry-run

# Safe mode — extra verbose preview with risk annotations
agentslow diagnose trace.yaml --safe-mode

# Apply — writes config patches after full safety chain
agentslow diagnose trace.yaml --apply
```

Six-gate safety chain: diagnose → classify → prescribe → golden-set verify → dry-run preview → human review → apply.

### Auto-Fix Pipeline

1. **Diagnose** → Classify regime + compute metrics
2. **Fix** → Generate prescriptive fixes with config patches
3. **Dry-Run** → Show diffs with SAFE/CAUTION/WARNING risk levels
4. **Fidelity** → Verify fixes don't change agent behavior (15 golden cases)
5. **Human Review** → `--safe-mode` preview before any writes
6. **Apply** → Machine-readable JSON config patches (explicit `--apply` opt-in)
7. **CI Gate** → Non-zero exit on CRITICAL issues

## Framework Support

- **LangGraph** (primary) — native trace parser
- **MCP** — tool call analysis
- **CrewAI** — multi-agent traces
- **Claude Code** — session analysis

## Project Stats

- 4,510+ lines across 12 modules
- 101 tests, 15 golden fidelity cases
- 10 atomic commits (v0.1.0 → v0.8.0)
- CLI-first, no dashboard — the moat is auto-fixing
- Six-gate safety chain with `--safe-mode` trust architecture

## Quick Start

```bash
# Install
pip install agentslow

# Diagnose a trace
agentslow diagnose your_trace.yaml

# Full analysis with all features
agentslow diagnose your_trace.yaml \
  --entropy \
  --fidelity \
  --config your_config.yaml \
  --dry-run

# Safe mode — preview everything before writing
agentslow diagnose your_trace.yaml --safe-mode

# Apply fixes (explicit opt-in)
agentslow diagnose your_trace.yaml --apply

# Benchmark before/after
agentslow benchmark your_config.yaml --compare --tasks 5

# CI mode (JUnit XML + exit codes)
agentslow diagnose your_trace.yaml --entropy --ci
```

## License

MIT
