Metadata-Version: 2.4
Name: traceix
Version: 0.1.0
Summary: Trajectory-based CI testing for AI agents
Project-URL: Homepage, https://github.com/itgoujie2/traceix
Project-URL: Repository, https://github.com/itgoujie2/traceix
Project-URL: Bug Tracker, https://github.com/itgoujie2/traceix/issues
Author-email: traceix <itgoujie2@gmail.com>
License: MIT
Keywords: agents,ci,llm,testing,trajectory
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.11
Requires-Dist: click>=8.0
Requires-Dist: httpx>=0.25
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: crewai
Requires-Dist: crewai>=0.1; extra == 'crewai'
Provides-Extra: dev
Requires-Dist: anthropic>=0.40; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-httpx>=0.30; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: langgraph
Requires-Dist: langchain-core>=0.2; extra == 'langgraph'
Requires-Dist: langgraph>=0.2; extra == 'langgraph'
Description-Content-Type: text/markdown

# traceix

**Trajectory-based CI testing for AI agents.**

traceix lets you declare which tools your agent should call — and in what order — as plain YAML, then run those assertions in CI the same way you'd run unit tests. No LLM-as-judge, no flaky eval pipelines: if the trajectory doesn't match, the build fails.

```bash
pip install traceix
```

---

## Why traceix?

LLM-powered agents are non-deterministic. The same prompt might call `search_flights` then `confirm_booking` today, but skip straight to `confirm_booking` tomorrow. traceix makes that observable and enforceable:

- **Declare the expected trajectory** in YAML — which tools, in what order, with what args.
- **Run it in CI** — the agent still calls the real LLM; only the tool *responses* are mocked.
- **Get a clear pass/fail** — no prompt-engineering an evaluator, no statistical thresholds.

---

## Quick start

```bash
pip install traceix
traceix init          # detects your framework, scaffolds traceix.yaml + tests/example.yaml
```

Write a test (`tests/book_flight.yaml`):

```yaml
name: book-flight-basic
input: "Book me the cheapest flight from NYC to SFO"

mocks:
  search_flights:
    return: { flights: [{ id: F1, price: 390 }, { id: F2, price: 420 }] }
  confirm_booking:
    return: { booking_id: BK-001, status: confirmed }

expected:
  trajectory:
    mode: contains        # these steps must appear in order (others allowed)
    steps:
      - tool: search_flights
        args: { origin: NYC, destination: SFO }
        arg_mode: partial  # only check the keys listed above
      - tool: confirm_booking
        arg_mode: ignore
  forbidden_tools: [cancel_booking]
```

Run it:

```bash
traceix run tests/ --handler mypackage.agent:run
```

```
  ✓  book-flight-basic   1/1   2 steps   142ms
  ──────────────────────────────────────────────
  1 passed · 0 failed
```

---

## Integration

### `tools=` parameter (any framework)

Your agent handler accepts a `tools` list injected by traceix:

```python
# mypackage/agent.py
def run(input: str, tools: list) -> str:
    graph = build_graph(tools)   # rebuild with injected (mocked) tools
    result = graph.invoke({"messages": [HumanMessage(content=input)]})
    return result["messages"][-1].content
```

### `@traceix_tool` decorator (LangChain / LangGraph)

No handler signature changes needed — just decorate your tools:

```python
from langchain_core.tools import tool
from traceix import traceix_tool

@traceix_tool
@tool
def search_flights(origin: str, destination: str) -> dict:
    """Search available flights."""
    ...  # real implementation
```

traceix patches the mock in during test runs automatically.

---

## CLI commands

| Command | What it does |
|---|---|
| `traceix init` | Detect framework, scaffold `traceix.yaml` + example test |
| `traceix run tests/` | Run tests, exit 0 on pass / 1 on fail |
| `traceix run tests/ --fixture record` | Save real tool responses to `.traceix/fixtures/` |
| `traceix run tests/ --fixture replay` | Replay recorded responses in CI |
| `traceix snapshot tests/` | Save golden trajectory baselines |
| `traceix check tests/` | Compare live run against saved baselines |
| `traceix compare tests/ --a "model=X" --b "model=Y"` | A/B test two model configs side by side |

---

## Trajectory modes

| Mode | Meaning |
|---|---|
| `strict` | Exact tool order and count |
| `contains` | Listed steps must appear in order (extra steps allowed) |
| `unordered` | All steps present, any order |
| `within` | Steps appear as a contiguous block |

## Arg modes

| Mode | Meaning |
|---|---|
| `exact` | Args must match exactly |
| `partial` | Listed keys must be present with matching values |
| `ignore` | Args not checked |

---

## Framework support

| Framework | Integration |
|---|---|
| LangGraph | `@traceix_tool` decorator or `tools=` injection |
| CrewAI | `tools=` injection |
| Anthropic SDK | `tools=` injection |
| OpenAI SDK | `tools=` injection |
| Any other | `tools=` injection |

---

## Configuration

Set defaults in `traceix.yaml` (or `[tool.traceix]` in `pyproject.toml`):

```yaml
handler: mypackage.agent:run
runs: 3          # runs per test case (increase in CI for confidence)
tolerance: 0.67  # fraction of runs that must pass
fixture_mode: replay
```

---

## License

MIT
