# Changelog

## v0.12.1 — 2026-05-15
- On REPL exit with an active session, print `agent-tester repl --session <name>  to resume` so the user always knows how to get back

## v0.12.0 — 2026-05-14
- **`agent-tester serve`**: new subcommand that opens an HTTP receiver (`POST /result`, `GET /health`) where external agents can POST completion results; rendered as Rich panels in real-time
- **`notify` tool**: when `--notify-url` is passed to `agent-tester repl`, models gain a `notify` tool they can call to POST their final result back to the server; the tool is absent from the tool list when no URL is configured
- **`agent-tester repl --notify-url`**: new flag that wires the notify tool to a running `agent-tester serve` instance

## v0.11.5 — 2026-05-14
- **Live multi-model display**: when querying multiple models the REPL now renders all response panels immediately (with animated spinners), then fills each in as its model completes — built with `rich.Live` + `rich.Spinner`

## v0.11.4 — 2026-05-14
- **Lazy branch creation**: branches are no longer created upfront at session start; a branch is created on the first `write_file` or `git_commit` call, named after the prompt that triggered it (`agenttester/<model>/<prompt-slug>`)
- **Streaming responses**: when querying multiple models the REPL now prints each response as it arrives rather than waiting for all models to finish
- **git.md skill**: updated branch format docs to reflect prompt-derived names; added a "Merge conflicts" section instructing agents to resolve conflicts automatically

## v0.11.3 — 2026-05-14
- **Cost package refactor**: `cost.py` split into `cost/` package (`base.py` — `CostEntry` + `CostBackend` ABC, `local.py` — `LocalCostBackend`, `tracker.py` — `CostTracker`); all existing imports unchanged

## v0.11.2 — 2026-05-14
- **Branch name sanitization**: all agent and run names are sanitized to valid git ref components before branch/worktree creation, preventing collisions from spaces, slashes, or other special characters
- **git.md skill update**: documents the `agenttester/<model-name>/<session-name>` branch format and instructs models not to create new branches
- **CWD default for workdir**: `agent-tester repl` now defaults to the current directory for tool use and branch creation; pass `--workdir` to override; branches always land in the repo the REPL is called from
- **Providers package refactor**: `providers.py` split into a `providers/` package (`base.py`, `anthropic.py`, `openai_compat.py`, `bedrock.py`) with a re-exporting `__init__.py`; all existing imports unchanged

## v0.11.1 — 2026-05-14
- **Anthropic tool use**: `AnthropicProvider` now supports tool use in the REPL agent loop; messages and tool definitions are automatically converted between OpenAI and Anthropic wire formats so the same agent loop drives all provider types
- Fix: `AnthropicProvider.call` now correctly separates system messages into Anthropic's top-level `system` field instead of passing them in the `messages` array (which the API rejects)

## v0.11.0 — 2026-05-14
- **Tool use in the REPL**: OpenAI-compatible models in the REPL now run a full agent loop with access to `bash`, `read_file`, `write_file`, `git_clone`, `git_commit`, and `git_push` tools; pass `--workdir` to activate tool use with a target directory
- **Automatic per-model branches**: when `--workdir` points to a git repo each model gets its own worktree on branch `agenttester/<model-name>/<session-name>`; all changes are automatically committed and can be pushed with `--push`
- **Session persistence**: pass `--session <name>` to save conversation history on exit and restore it on the next `repl` invocation; sessions are stored in `~/.config/agenttester/sessions/`
- **SSH PEM key support**: `--pem <path>` sets `GIT_SSH_COMMAND` for all git operations (clone, commit push) that need SSH auth
- **`git_push` and `git_commit` tools** added to `ToolExecutor`; `orchestrator run` now supports `--push`, `--remote`, and `--pem` flags for pushing agent branches after a run

## v0.10.1 — 2026-05-14
- Render REPL model responses as markdown so code blocks are syntax-highlighted and ``` fences are not shown as literal text

## v0.10.0 — 2026-05-14
- **Bedrock models in the REPL**: add a `models:` config section that lets you define REPL models using any named provider — including AWS Bedrock — by referencing a provider defined in the `providers:` block; inline `endpoint:` still works for OpenAI-compatible models without a named provider
- Provider instances are now built lazily per-reference, so legacy configs with provider entries that only specify `api_key_env` (no `endpoint`) continue to work in the backward-compat agent-command path

## v0.9.1 — 2026-05-14
- Fix Bedrock tests failing on CI when boto3 is not installed: use `patch.dict(sys.modules)` instead of `patch("boto3.Session")` so tests pass without the `[aws]` extras group

## v0.9.0 — 2026-05-14
- **Provider class hierarchy**: replace the flat `ProviderConfig` dataclass with an abstract `Provider` base class and three concrete implementations — `AnthropicProvider`, `OpenAICompatProvider`, and `BedrockProvider`; provider calling logic now lives with the provider class rather than in `evaluator.py`
- **AWS Bedrock support**: `BedrockProvider` calls the Bedrock Converse API via boto3 (install with `pip install agenttester[aws]`); supports three auth modes: named AWS CLI profile (`aws_profile`), explicit credential env vars (`aws_access_key_id_env` / `aws_secret_access_key_env` / `aws_session_token_env`), and the default boto3 credential chain
- **`type` field on provider config entries**: providers now declare their type explicitly (`type: anthropic`, `type: openai`, `type: bedrock`); inline evaluator forms (`api: anthropic`, bare `endpoint:`) remain supported for backward compatibility

## v0.8.0 — 2026-05-13
- **Provider-level credentials**: add a `providers` block to define shared endpoint and API key env vars for cloud LLM providers (AWS Bedrock, Azure AI Foundry, GCP Vertex, etc.); evaluators and REPL model agents can reference a provider by name and inherit its credentials
- **Model-level API key override**: model-level `api_key_env` and `endpoint` fields always take precedence over the provider-level defaults
- **OpenAI-compatible auth**: `Authorization: Bearer` header is now sent automatically when `api_key_env` is set on an evaluator or REPL model agent, enabling any authenticated OpenAI-compatible endpoint

## v0.7.0 — 2026-05-13
- **REPL `@`-autocomplete**: typing `@` in the REPL now autocompletes model names as you type; `@modelname message` routes a message to a single model instead of broadcasting to all
- **Skills injected into REPL context**: skill instructions are loaded at REPL startup and seeded as a system message for each model; `/reset` restores to this seeded state rather than fully empty history

## v0.6.0 — 2026-05-13
- **Descriptive branch names**: branches are now `agenttester/<agent-name>/<run-name>` instead of `agenttester/<run-id>/<agent-name>`; use `--name` to provide a human-readable run name, or a slug is derived from the prompt automatically
- **LLM-based code evaluation**: configure multiple independent LLM evaluators (Anthropic API or any OpenAI-compatible endpoint such as vllmd) to review each agent's diff for accuracy, readability, code smells, and correctness
- **Aggregate reports**: evaluator critiques are synthesized into a single aggregate assessment per agent; raw per-evaluator reports are preserved in the markdown report
- **Iterative refinement loop**: after evaluation the user selects which agents (1–all) to re-run; selected agents receive the aggregate feedback as context and commit each refinement to the same branch (tracked via `iter-N` commit messages)
- **Feedback summarization**: aggregate feedback is summarized before injection if it exceeds `max_aggregate_tokens`; set `inject_raw_reports: true` to send all raw evaluator reports instead
- Worktrees are now kept alive across iterations and only cleaned up when the user stops iterating

## v0.5.2 — 2026-05-12
- Fix test suite running real agent orchestrator: mock `Orchestrator` in CLI test to prevent worktree/branch/report side-effects and cut test time from 220s to 5s

## v0.5.1 — 2026-05-12
- Consolidate duplicated patterns across `agent_runner`, `repl`, and `skills`
- Consolidate config path resolution into `get_config_paths()`; fix REPL global config fallback
- Fix CI publish workflow: move `id-token: write` to workflow level for trusted publishing

## v0.5.0 — 2026-05-12
- Add interactive agent input routing and idle pause/resume

## v0.4.5 — 2026-05-12
- Include `skills.py` priority logic and lock file

## v0.4.4 — 2026-05-12
- Add git/bash built-in skills; prioritise user skills over built-ins

## v0.4.3 — 2026-05-12
- Remove agent count cap

## v0.4.2 — 2026-05-12
- Store reports in global config dir by default
- Support `.yml` and `.yaml` config file extensions

## v0.4.1 — 2026-05-12
- Update packages for wheel compatibility

## v0.4.0 — 2026-05-12
- Consolidate global config to `~/.config/agenttester/config.yml`

## v0.3.7 — 2026-05-11
- Add skills directory system and fix test path resolution

## v0.3.6 — 2026-05-11
- Add connection check on REPL startup with `--skip-checks` / `-S` flag

## v0.3.5 — 2026-05-11
- Add spinner while querying models in REPL

## v0.3.4 — 2026-05-11
- Open REPL by default when run with no subcommand

## v0.3.3 — 2026-05-11
- Fix `agent-tester` binary name in REPL, config example, and tests

## v0.3.2 — 2026-05-11
- Standardize on `agent-tester` binary and config filename

## v0.3.1 — 2026-05-11
- Read version from package metadata
- Replace `at` alias with `agent-tester` to avoid conflict with Unix job scheduler

## v0.3.0 — 2026-05-11
- Add branch name injection, auto-pull, and global config support

## v0.2.0 — 2026-05-11
- Add cost tracking system with local storage and CLI interface

## v0.1.1 — 2026-05-08
- Add README

## v0.1.0 — 2026-05-08
- Initial release
