Metadata-Version: 2.4
Name: agentguardCI
Version: 0.1.0
Summary: Open-source CI contract testing for tool-using AI agents.
Project-URL: Homepage, https://github.com/yazeed-km/AgentGuard
Project-URL: Repository, https://github.com/yazeed-km/AgentGuard
Project-URL: Issues, https://github.com/yazeed-km/AgentGuard/issues
Project-URL: Releases, https://github.com/yazeed-km/AgentGuard/releases
Project-URL: Documentation, https://github.com/yazeed-km/AgentGuard#readme
Author: AgentGuard CI Contributors
License-Expression: MIT
License-File: LICENSE
Keywords: agents,ai,ci,evals,llm,testing
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27.0
Requires-Dist: jsonpath-ng>=1.6.1
Requires-Dist: jsonschema>=4.22.0
Requires-Dist: pydantic>=2.7.0
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: rich>=13.7.1
Requires-Dist: tenacity>=8.2.3
Requires-Dist: typer>=0.12.3
Provides-Extra: dev
Requires-Dist: build>=1.2.1; extra == 'dev'
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest>=8.2.0; extra == 'dev'
Requires-Dist: ruff>=0.4.4; extra == 'dev'
Requires-Dist: twine>=5.1.0; extra == 'dev'
Provides-Extra: langchain
Requires-Dist: langchain>=0.2.0; extra == 'langchain'
Provides-Extra: openai
Requires-Dist: openai>=1.30.0; extra == 'openai'
Provides-Extra: openai-agents
Requires-Dist: openai-agents>=0.0.0; extra == 'openai-agents'
Description-Content-Type: text/markdown

<p align="center">
  <img src="assets/logo.svg" alt="AgentGuard CI logo" width="128" height="128">
</p>

<h1 align="center">AgentGuard CI</h1>

<p align="center">
  Open-source CI contract testing for tool-using AI agents.
</p>

<p align="center">
  <a href="https://github.com/yazeed-km/AgentGuard/releases">
    <img alt="Early Beta" src="https://img.shields.io/badge/status-early_beta-111827?style=for-the-badge">
  </a>
  <a href="https://github.com/yazeed-km/AgentGuard/releases">
    <img alt="Latest Release" src="https://img.shields.io/github/v/release/yazeed-km/AgentGuard?style=for-the-badge&logo=github&label=release">
  </a>
  <a href="https://github.com/yazeed-km/AgentGuard/stargazers">
    <img alt="GitHub Stars" src="https://img.shields.io/github/stars/yazeed-km/AgentGuard?style=for-the-badge&logo=github">
  </a>
  <a href="LICENSE">
    <img alt="License" src="https://img.shields.io/github/license/yazeed-km/AgentGuard?style=for-the-badge">
  </a>
  <img alt="Python 3.11+" src="https://img.shields.io/badge/python-3.11%2B-3776AB?style=for-the-badge&logo=python&logoColor=white">
</p>

<p align="center">
  <a href="README.md">English</a> ·
  <a href="docs/i18n/README.zh-CN.md">简体中文</a> ·
  <a href="docs/i18n/README.ja.md">日本語</a> ·
  <a href="docs/i18n/README.ko.md">한국어</a>
</p>

AgentGuard CI helps teams block risky prompt, tool, and agent changes before they reach
production. Define trace-level contracts, mock tools, run impacted tests only, enforce
latency/token budgets, and fail CI when agent behavior regresses.

## Why AgentGuard CI Exists

Final-answer evals are not enough for production agents. A response can look acceptable while
the trace is unsafe: the wrong tool was called, a required policy lookup was skipped, a refund
tool ran before confirmation, or token usage doubled.

AgentGuard CI focuses on deterministic engineering contracts:

- Required and forbidden tool calls.
- Tool call ordering.
- Routing decisions.
- Structured output schemas.
- Latency, token, cost, and tool-call budgets.
- Mocked tool results for safe offline CI.
- Baseline comparison for regression prevention.

LLM-as-judge is available as an optional path, but deterministic assertions are the default.

## Installation

```bash
pip install agentguard
```

For local development from this repository:

```bash
pip install -e ".[dev]"
```

OpenAI judge support is optional:

```bash
pip install "agentguard[openai]"
```

## Quick Start

```bash
agentguard init
agentguard test
```

Starter test:

```yaml
suite: "calendar-agent"

tests:
  - id: "uses_calendar_tool"
    input: "Book a meeting tomorrow at 5 PM."
    assert:
      trace:
        must_call:
          - tool: "calendar.create_event"
      output:
        contains:
          - "meeting"
```

## CLI

```bash
agentguard --help
agentguard --version
agentguard init
agentguard test
agentguard test --config agentguard.yml
agentguard test --suite calendar-agent
agentguard test --case uses_calendar_tool
agentguard test --tag smoke
agentguard test --changed-only --base origin/main
agentguard test --report json --report junit
agentguard test --fail-fast
agentguard test --update-baseline
agentguard test --strict
agentguard list
agentguard diff
agentguard baseline update
agentguard baseline list
agentguard validate
python -m agentguard --help
```

Exit codes:

- `0`: all blocking tests passed.
- `1`: at least one blocking test failed.
- `2`: configuration error.
- `3`: agent runtime error.

## Agent Entrypoint Contract

Your agent exposes a sync or async Python function:

```python
from agentguard import AgentRequest, AgentResult, TraceEvent, Usage


async def run_agent(request: AgentRequest) -> AgentResult:
    trace = [
        TraceEvent(
            type="tool_call",
            name="calendar.create_event",
            input={"time": "5 PM"},
        )
    ]
    return AgentResult(
        output="Meeting booked for 5 PM.",
        trace=trace,
        usage=Usage(total_tokens=512, latency_ms=1200),
    )
```

Configure it in `agentguard.yml`:

```yaml
version: "0.1"

project:
  name: "customer-support-agents"

agent:
  entrypoint: "my_package.agent:run_agent"
  timeout_seconds: 30

paths:
  tests: "agentguard-tests"
  baselines: ".agentguard/baselines"
  reports: ".agentguard/reports"

mocks:
  require_registered: false
```

## Tool Mocking

AgentGuard does not monkeypatch tools in the MVP. The agent voluntarily reads mocks from the
request or uses `get_mock`.

```yaml
mocks:
  orders.get_order:
    match_args:
      order_id: "ORD-123"
    output:
      order_id: "ORD-123"
      status: "delivered"
```

```python
from agentguard import get_mock

tool_output = get_mock(
    request,
    "orders.get_order",
    args={"order_id": "ORD-123"},
)
```

## Assertions

Trace assertions:

```yaml
assert:
  trace:
    must_call:
      - tool: "orders.get_order"
    must_not_call:
      - tool: "payments.issue_refund"
    ordered:
      - tool: "orders.get_order"
      - tool: "policy.lookup_refund_policy"
    max_tool_calls: 5
```

Output assertions:

```yaml
assert:
  output:
    contains:
      - "confirmation"
    not_contains:
      - "refunded"
    json_schema:
      type: object
      required: ["category"]
      properties:
        category:
          type: string
          enum: ["billing", "technical", "account", "other"]
    jsonpath:
      - path: "$.category"
        equals: "billing"
```

Budget assertions:

```yaml
assert:
  budgets:
    max_latency_ms: 5000
    max_total_tokens: 3000
    max_cost_usd: 0.05
```

## Impact-Aware Testing

Create `agentguard-impact.yml`:

```yaml
mappings:
  - files:
      - "agents/refund/**"
      - "prompts/refund/**"
      - "tools/refund.py"
    tests:
      - "refund-agent"
      - "refund-regression"
```

Run impacted suites only:

```bash
agentguard test --changed-only --base origin/main
```

If no mapping matches, AgentGuard runs smoke-tagged tests when present.

## Baseline Regression Testing

Store baselines for passing tests:

```bash
agentguard test --update-baseline
```

Enable baseline comparison:

```yaml
assert:
  regression:
    compare_to_baseline: true
    allowed_output_similarity_drop: 0.1
    allowed_extra_tool_calls: 1
    allowed_latency_increase_pct: 30
```

Baselines are stored under:

```text
.agentguard/baselines/{suite}/{test_id}.json
```

## Reports

AgentGuard prints a terminal report and can write machine-readable reports:

```bash
agentguard test --report json
agentguard test --report junit
```

Outputs:

```text
.agentguard/reports/report.json
.agentguard/reports/junit.xml
```

## GitHub Actions

```yaml
name: AgentGuard CI

on:
  pull_request:
    branches: [main]

jobs:
  agentguard:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install agentguard

      - name: Run impacted agent tests
        run: |
          agentguard test --changed-only --base origin/main --report junit
```

Branch protection setup:

1. Open repository settings.
2. Enable branch protection for `main`.
3. Require status checks before merging.
4. Select the AgentGuard CI workflow.

## Optional LLM Judge

LLM judge tests are opt-in and should be used selectively because provider calls can be slower,
more expensive, and less deterministic than trace contracts.

```yaml
assert:
  llm_judge:
    enabled: true
    rubric: |
      Score whether the answer correctly explains the refund policy and does not invent
      unsupported exceptions.
    threshold: 0.85
```

Use the fake provider for deterministic local tests or configure OpenAI with `OPENAI_API_KEY`.

## Example

```bash
cd examples/simple_agent
agentguard test
```

The example agent uses a mocked `calendar.create_event` tool and validates the trace, output,
latency, token budget, and tool-call budget.

## Roadmap

### v0.1 Alpha

- YAML tests.
- CLI runner.
- Python agent entrypoint.
- Trace assertions.
- Output assertions.
- Budgets.
- Mocks.
- Reports.
- GitHub Actions example.

### v0.2 Core Hardening

- Public model contract cleanup.
- Complete deterministic assertion coverage.
- Strict validation mode.
- Clear domain exceptions and failure messages.
- Expanded unit coverage.

### v0.3 Baselines and Reports

- Versioned baseline artifacts.
- Baseline diff improvements.
- Stable JSON report schema.
- Improved JUnit output.
- Baseline CLI subcommands.

### v0.4 Mocking and Impact

- Stricter mock argument matching.
- Registered mock failure mode.
- Hardened changed-only behavior.
- Smoke fallback rules.

### v0.5 Adapters

- Default custom Python adapter.
- OpenAI Agents SDK adapter.
- LangChain adapter.
- Optional extras and adapter docs.

### v1.0 Release Candidate

- Frozen public models and JSON report schema.
- Complete docs.
- Clean install verification.
- TestPyPI package validation.
- Stable local and CI runtime behavior.

## Contributing

Keep the open-source core CI-first, deterministic by default, and framework-agnostic. New
integrations should preserve the same trace contract model instead of hiding behavior behind
provider-specific abstractions.

## Release

Release steps are documented in [docs/RELEASE.md](docs/RELEASE.md).

## Documentation

- [Documentation Index](docs/README.md)
- [Developer Guide](docs/DEVELOPER_GUIDE.md)
- [Quick Start](docs/QUICKSTART.md)
- [CLI Reference](docs/CLI_REFERENCE.md)
- [Python API](docs/PYTHON_API.md)
- [YAML Reference](docs/YAML_REFERENCE.md)
- [Assertion Reference](docs/ASSERTIONS.md)
- [Baseline Guide](docs/BASELINES.md)
- [Mocking Guide](docs/MOCKING.md)
- [Impact Selection](docs/IMPACT_SELECTION.md)
- [CI Guide](docs/CI.md)
- [Reports](docs/REPORTS.md)
- [Adapter Guide](docs/ADAPTERS.md)
- [Architecture](docs/ARCHITECTURE.md)
- [File Structure](docs/FILE_STRUCTURE.md)
