Metadata-Version: 2.4
Name: wendell
Version: 0.1.7
Summary: Wendell CLI for playbook-generated agent test suites and run uploads
License-Expression: LicenseRef-Proprietary
Project-URL: Homepage, https://www.wendellai.com
Project-URL: Repository, https://github.com/croppia/wendellai
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: psycopg[binary]>=3.3
Requires-Dist: pydantic>=2.12
Requires-Dist: sqlalchemy>=2.0
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"

# Wendell CLI

Wendell is a Python CLI for evaluating agents against Wendell-managed worlds and scenario packs.

The CLI is advisory by default: it reports scores, captures traces, and returns a successful process exit unless the project explicitly enables blocking gates. In blocking mode, `wendell run` and `wendell test` return a nonzero exit code when gates fail.

## Intended split

- Wendell system: creates worlds, versions scenario packs, owns rubrics, stores traces, and reports regressions.
- Wendell CI runner: runs in a repo or CI job, fetches or reads a scenario pack, invokes the customer's agent adapter, captures traces, evaluates gates, uploads results, and prints a concise summary.

## Install

Install the latest published CLI:

```bash
uv tool install wendell
```

Alternative installers:

```bash
pipx install wendell
python3 -m pip install --user wendell
```

The packaged CLI includes Wendell's local compiler/runtime engine. It does not
require an OpenAI, OpenRouter, Anthropic, or other LLM provider key to install,
register, compile, or run tests. Your own `agent_command` may need provider
credentials if your agent uses an LLM.

For unreleased changes from current main:

```bash
python3 -m pip install "git+https://github.com/croppia/wendellai.git#subdirectory=wendell-ci"
```

For release validation from this monorepo, run:

```bash
scripts/smoke_wendell_production_app_install.sh
```

That smoke creates a clean app directory, installs the packaged `wendell` CLI
into a fresh virtualenv, runs `wendell init`, and runs `wendell test` without
using repo-local `PYTHONPATH`. The smoke adapter asserts the customer-facing
payload uses `wendell.agent_input.v1` and does not expose rubrics, hidden facts,
source lineage, terminal outcomes, or success/failure criteria.

The package installs the `wendell` command. It also installs the temporary `local-wendell` alias for existing local dogfood scripts.

## Local development

```bash
cd wendell-ci
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest
wendell --help
```

## Login

For a first-time Wendell runner, install and register the CLI:

```bash
curl -fsSL https://www.wendellai.com/install | bash
wendell register
```

Registration creates a short-lived runner session, asks for email/password and agent details, and stores a scoped local runner credential.

For local development with an existing InkPass API key, store it once:

```bash
wendell login --api-key-stdin --validate
wendell auth status
```

Credentials are stored at `~/.config/wendell/credentials.json` by default, or `$WENDELL_CONFIG_HOME/credentials.json` when set. The directory is created with `0700` permissions and the credentials file with `0600` permissions.

CI should continue to use `WENDELL_INKPASS_API_KEY`; environment variables take precedence over stored credentials.

## Minimal config

```toml
project = "support-agent"
mode = "advisory"
api_key_env = "WENDELL_INKPASS_API_KEY"
world = "world_support_ops_v1"
world_version = "2026-05-08.1"
scenario_pack = "smoke"
scenario_pack_version = "1.0.0"
agent_command = "python examples/simple_cli_agent.py"
upload_traces = true

[gates]
suite_min_score = 0.80
scenario_min_score = 0.75
critical_failures_allowed = 0
```

## Production app test loop

Use `wendell test` as the local and CI entry point for production agents. Unlike
the legacy top-level command, `wendell test` requires a real `agent_command` and
will not silently fall back to a demo agent.

Fast start from an application repo:

```bash
wendell init --project support-agent
export WENDELL_APP_AGENT_COMMAND="python scripts/run_my_agent.py"
wendell test --config wendell.toml
```

Your adapter receives a production-facing JSON payload. Wendell does not send
the scoring rubric, hidden facts, expected outcomes, or success/failure criteria
to the agent process.

```json
{
  "schema_version": "wendell.agent_input.v1",
  "task": "Respond as a support agent for Support Agent.",
  "policies": ["Inspect the request before completing the workflow."],
  "transcript": [{"speaker": "customer", "text": "I need help with this request."}],
  "available_tools": [
    {
      "name": "workflow_console.inspect_request",
      "arguments": {"case_id": "str"},
      "description": "inspect request"
    }
  ],
  "case": {"case_id": "case_123", "request": "I need help with this request."},
  "scenario": {"id": "case_123", "kind": "realistic"}
}
```

The adapter must print JSON:

```json
{
  "message": "I inspected the request and recorded the required evidence.",
  "tool_calls": [
    {"name": "workflow_console.inspect_request", "args": {"case_id": "case_123"}}
  ],
  "metrics": {}
}
```

`wendell init` creates:

- `playbook.md`: the business behavior contract to edit with your team
- `.wendell/suite.json`: the generated, reviewable test suite
- `scripts/wendell_agent_adapter.py`: the adapter boundary for your production agent
- `wendell.toml`: the CI-ready test config

The generated adapter is intentionally not a fake passing agent. It requires
`WENDELL_APP_AGENT_COMMAND` or replacement code that calls your production agent
and maps its real actions back to Wendell tool names. For install-only smoke
validation, run `wendell init --example-agent`; do not use that adapter as a
production test.

To compile an existing Playbook instead:

```bash
wendell playbook compile \
  --source ./playbook.md \
  --name "Support Agent Regression" \
  --workflow-summary "Evaluate support agents against the approved workflow." \
  --output ./.wendell/support-agent-suite.json \
  --config-output ./wendell.toml \
  --agent-command "python scripts/run_agent_adapter.py"

wendell test --config wendell.toml
wendell test --config wendell.toml --json
```

`wendell playbook compile` is a local, deterministic bootstrap path. It turns a
markdown Playbook into a reviewable customer-input suite JSON using the same
Playbook assertion compiler that powers generated suites. Teams can commit the
generated suite, review it in code review, and run it in CI.

For CI, set `mode = "blocking"` so failed gates return a nonzero exit code:

```toml
project = "support-agent"
mode = "blocking"
worldsim_input = "configs/customer_inputs/support_agent_suite.json"
agent_command = "python scripts/run_agent_adapter.py"
upload_traces = false

[gates]
suite_min_score = 0.90
scenario_min_score = 0.85
critical_failures_allowed = 0
```

Failure output includes the failed scenario, gate reason, incomplete workflow
steps, Playbook rule id, assertion id, trajectory event indexes, and fix prompts
when the suite provides assertions.

### What to commit

Commit these files:

- `playbook.md`
- `.wendell/suite.json`
- `wendell.toml`
- your production adapter, usually `scripts/wendell_agent_adapter.py`

Do not commit secrets, raw customer transcripts, API keys, or generated run
outputs. Keep production credentials in your CI secret store.

### GitHub Actions

```yaml
name: Wendell Agent Tests

on:
  pull_request:
  push:
    branches: [main]

jobs:
  wendell:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: actions/setup-python@v6
        with:
          python-version: "3.13"
      - name: Install Wendell
        run: |
          python -m pip install --upgrade pip
          python -m pip install wendell
      - name: Run Wendell
        env:
          WENDELL_INKPASS_API_KEY: ${{ secrets.WENDELL_INKPASS_API_KEY }}
          WENDELL_APP_AGENT_COMMAND: python scripts/run_my_agent.py
        run: wendell run --suite refund-agent-regression --config wendell.toml
```

For hosted Wendell reporting, keep `upload_traces = true` and provide
`WENDELL_INKPASS_API_KEY` from CI secrets. Add `api_url` only when targeting a
staging, preview, or local API.

For a newly published hosted suite, create the repo-local config and adapter
boundary first:

```bash
wendell suites configure --suite refund-agent-regression --project refund-agent
```

## Wendell dogfood suite

This repo includes a production-style dogfood suite for Wendell's own
Playbook-to-test-suite behavior:

```bash
uv run --project wendell-ci --extra dev --with-editable . wendell test --config wendell.dogfood.toml
```

The suite lives at
`configs/customer_inputs/wendell_playbook_compiler_quality_input.json` and is
derived from `configs/playbooks/wendell_playbook_compiler_quality.md`.

The real Wendell Playbook compiler adapter is expected to pass:

```bash
uv run --project wendell-ci --extra dev --with-editable . wendell test --config wendell.dogfood.toml
```

The intentionally broken agent is expected to fail with rule-linked evidence:

```bash
uv run --project wendell-ci --extra dev --with-editable . wendell test \
  --config wendell.dogfood.toml \
  --agent-command "python examples/wendell_playbook_compiler_broken_agent.py"
```

The intended production flow is:

1. `local-wendell` runs on the developer machine or CI worker.
2. It fetches the pinned world, scenario pack, rubrics, and scoring contract from Wendell.
3. It invokes the agent locally through an adapter such as `agent_command`.
4. It captures local traces, redacts them, and uploads traces/results back to Wendell.
5. Advisory mode exits `0`; blocking mode exits nonzero only when explicitly enabled.

Remote uploads authenticate with an InkPass API key sent as `X-API-Key`. The key needs at least `wendell:worlds:read`, `wendell:runs:create`, and `wendell:runs:read`.

## Local worldsim dogfood config

While Wendell Cloud APIs are still taking shape, the runner can call the existing local `worldsim.services` layer directly:

```toml
project = "workspace-access-agent"
mode = "advisory"
world = "workspace_access_support"
scenario_pack = "smoke"
worldsim_input = "../configs/customer_inputs/workspace_access_support_input.json"
agent = "careful"

[gates]
suite_min_score = 0.80
scenario_min_score = 0.75
critical_failures_allowed = 0
```

Supported local built-in agents are `careful` and `risky`.
