Metadata-Version: 2.4
Name: base-sequence-toolkit
Version: 0.1.0
Summary: XEPV Base Sequence Analysis Toolkit for AI Agent Execution Traces
Author: DunCrew Team
License: MIT
Project-URL: Homepage, https://github.com/FatBy/base-sequence-toolkit
Project-URL: Documentation, https://github.com/FatBy/base-sequence-toolkit#readme
Keywords: ai-agent,execution-trace,base-sequence,xepv,swe-bench
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: swe-agent
Requires-Dist: datasets>=2.14; extra == "swe-agent"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Dynamic: license-file

# Base Sequence Toolkit

**XEPV Base Sequence Analysis Framework for AI Agent Execution Traces**

[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

## What is this?

When an AI agent solves a task, it executes a sequence of actions: reading files, writing code, running tests, searching for information. **Base Sequence Toolkit** classifies each action into one of four *base types*:

| Base | Name | Description | Examples |
|------|------|-------------|----------|
| **X** | eXplore | Information gathering | `readFile`, `webSearch`, `ls`, `grep` |
| **E** | Execute | State-changing actions | `writeFile`, `edit`, `npm install` |
| **P** | Plan | Pure reasoning/strategy | LLM thinking without tool calls |
| **V** | Verify | Validation & testing | `pytest`, `tsc --noEmit`, read-after-write |

A task execution becomes a **base sequence** like `XEEVXEV` — a compact fingerprint of the agent's behavioral strategy.

This toolkit provides:
- **Classifier**: Map tool calls → XEPV base types (stateful, context-aware)
- **Analyzer**: N-gram patterns, transition matrices, positional effects, risk profiles
- **Adapters**: Ready-made integrations for SWE-agent and DunCrew trace formats
- **CLI**: One-command analysis from terminal

## Key Findings

Analysis of 2,000+ SWE-agent trajectories and 500+ DunCrew execution traces revealed:

| Finding | Detail |
|---------|--------|
| **V-base deficit** | V (Verify) comprises only ~3.3% of all bases in SWE-agent traces. Resolved tasks have significantly more V bases than unresolved ones. |
| **E→V transition bottleneck** | The probability of transitioning from E (Execute) to V (Verify) is only 0.6% — agents almost never verify after executing. |
| **Exploration spirals** | Consecutive X runs (XXX+) are strongly associated with task failure. |
| **Late planning is harmful** | P bases appearing in the second half of execution correlate with lower success rates. |
| **E-V pairing predicts success** | The "golden path" pattern (Execute then Verify) is the strongest positive predictor. |

## Installation

```bash
pip install base-sequence-toolkit

# With SWE-agent HuggingFace support:
pip install base-sequence-toolkit[swe-agent]

# For development:
pip install base-sequence-toolkit[dev]
```

Or install from source:

```bash
git clone https://github.com/user/base-sequence-toolkit.git
cd base-sequence-toolkit
pip install -e ".[dev,swe-agent]"
```

## Quick Start

### Analyze SWE-agent Trajectories

```python
from base_sequence_toolkit.adapters.swe_agent import load_from_huggingface
from base_sequence_toolkit.core.analyzer import run_full_analysis

# Load and classify 500 SWE-agent trajectories
results = load_from_huggingface(max_records=500)

# Extract sequences and outcomes
sequences = [r.base_sequence for r in results]
outcomes = [r.resolved for r in results]

# Run comprehensive analysis
report = run_full_analysis(sequences, outcomes)
print(report.format_summary())
```

### Classify Your Own Agent's Actions

```python
from base_sequence_toolkit import classify_step, create_context, StepClassification, BaseType

ctx = create_context()

steps = [
    StepClassification(tool_name="readFile", args={"path": "src/main.py"}),
    StepClassification(tool_name="writeFile", args={"path": "src/main.py", "content": "..."}, status="success"),
    StepClassification(tool_name="readFile", args={"path": "src/main.py"}),  # read-after-write → V
    StepClassification(tool_name="runCmd", shell_command="pytest tests/"),
]

for step in steps:
    base = classify_step(step, ctx)
    print(f"{step.tool_name:15s} → {base}")

# Output:
# readFile        → X  (first access to unknown file)
# writeFile       → E  (state-changing action)
# readFile        → V  (reading file we just wrote)
# runCmd          → V  (running tests after writes)
```

### Analyze Custom Sequences

```python
from base_sequence_toolkit.core.analyzer import (
    compute_sequence_stats,
    compute_transition_matrix,
    extract_risk_profile,
    find_discriminative_patterns,
)

# Compute stats for a single sequence
stats = compute_sequence_stats("XEEVXEV")
print(f"Length: {stats.length}")
print(f"EV pairs: {stats.ev_pairs}")
print(f"Has verification: {stats.has_verification}")
print(f"Switch rate: {stats.switch_rate}")

# Transition matrix across multiple sequences
matrix = compute_transition_matrix(["XEEVXEV", "XXXEEE", "XEVEV"])
print(matrix.format())

# Risk profile
risk = extract_risk_profile("XXXEEEEE")
print(f"Risk flags: {risk.flags()}")
# → ['consecutive_x≥3', 'no_verification']

# Find patterns that distinguish success from failure
patterns = find_discriminative_patterns(
    resolved_sequences=["XEVEV", "XEEVE"],
    unresolved_sequences=["XXXEE", "XXXXX"],
    min_count=1,
)
for p in patterns[:5]:
    print(f"  {p.pattern}: lift={p.lift:.2f}")
```

## CLI Usage

```bash
# Analyze SWE-agent trajectories
bst-analyze swe-agent -n 500 -o results/

# Analyze DunCrew execution traces
bst-analyze duncrew /path/to/exec_traces/ -o results/

# Analyze pre-computed JSON data
bst-analyze json data/my_sequences.json
```

## Writing Your Own Adapter

To integrate with a new agent framework, implement a function that converts your trace format into `StepClassification` objects:

```python
from base_sequence_toolkit import classify_step, create_context, StepClassification

def classify_my_agent_trace(trace: dict) -> str:
    """Convert your agent's trace into an XEPV base sequence."""
    ctx = create_context()
    bases = []

    for action in trace["actions"]:
        step = StepClassification(
            tool_name=action["tool"],
            args=action.get("arguments", {}),
            status="success" if action["succeeded"] else "error",
            shell_command=action.get("command"),
        )
        base = classify_step(step, ctx)
        bases.append(str(base))

    return "".join(bases)
```

## Project Structure

```
base-sequence-toolkit/
├── base_sequence_toolkit/
│   ├── __init__.py
│   ├── cli.py                    # CLI entry point
│   ├── core/
│   │   ├── classifier.py         # Generic XEPV classifier
│   │   └── analyzer.py           # Analysis primitives
│   └── adapters/
│       ├── swe_agent.py          # SWE-agent trajectory adapter
│       └── duncrew.py            # DunCrew trace adapter
├── examples/
│   └── swe_agent_analysis.py     # End-to-end example
├── tests/
│   ├── test_classifier.py
│   ├── test_analyzer.py
│   └── test_swe_agent.py
├── data/                         # Pre-computed results (optional)
├── pyproject.toml
├── LICENSE
└── README.md
```

## Citation

If you use this toolkit in your research, please cite:

```bibtex
@software{base_sequence_toolkit,
  title = {Base Sequence Toolkit: XEPV Analysis Framework for AI Agent Execution Traces},
  year = {2025},
  url = {https://github.com/FatBy/base-sequence-toolkit}
}
```

## License

MIT License. See [LICENSE](LICENSE) for details.

## Related

- **XEPV Framework**: The base classification system described in our paper
- **DunCrew**: AI operating system with XEPV-based execution governance
- **SWE-bench**: Software engineering benchmark for AI agents
