# Changelog

## v2.1.0 — 2026-05-13
- **Tool-use agent loop** (`vllmd task`): run an agentic task against any vLLM-compatible endpoint using an iterative tool-call loop
- Six built-in tools: `bash` (shell execution), `read_file`, `write_file`, `git_clone`, `git_commit`, `git_push`
- `--pem` flag on `vllmd task` for SSH PEM key authentication in git operations (sets `GIT_SSH_COMMAND`)
- `--workdir`, `--max-turns`, `--system`, `--api-key` options for the task command
- New `AgentLoop` / `create_loop` public API in `vllmd.loop`; `ToolExecutor` and `TOOL_DEFINITIONS` exported from `vllmd.tools`
- Tool output truncated to 8 KB to avoid context overflow

## v2.0.2 — 2026-05-13
- Orchestrator persists registry state to `~/.local/share/vllmd/registry.json` after every mutation (cluster up/down, agent health refresh)
- On restart the orchestrator loads the last-known registry from disk before querying agents, so it can immediately route requests while the background refresh runs
- `ModelRegistry.dump()` / `load()` for serialization; save/load errors are silently suppressed so a missing or corrupt file never blocks startup

## v2.0.1 — 2026-05-13
- Move `container_runtime` from `ClusterConfig` to `NodeConfig`: each node configures its own runtime; the orchestrator has no container runtime dependency
- Remove `--runtime` from `vllmd orchestrator start` (the orchestrator never invokes a container runtime)
- `container_runtime` field now lives on `NodeConfig` (YAML: `nodes[*].container_runtime`, default `"docker"`)

## v2.0.0 — 2026-05-13
- **Agent daemon** (`vllmd agent start/stop`): lightweight FastAPI service that manages Docker containers on a single node, with GPU auto-allocation via `VLLMD_NODE_GPUS`
- **Orchestrator service** (`vllmd orchestrator start/stop`): FastAPI control plane that proxies OpenAI-compatible API requests (`/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`) to the correct node/model with round-robin load balancing; supports node pinning via `X-Vllmd-Node` header
- **Cluster commands**: `vllmd up [--model]`, `vllmd down [--model]`, `vllmd nodes` for declarative model lifecycle management
- **Cluster config**: `nodes`, `models`, `orchestrator`, and `agent` top-level keys in `vllmd.yaml`; see `config.example.yaml`
- **Registry + router**: in-memory `ModelRegistry` tracking healthy endpoints per model; `pick_endpoint()` for round-robin and pinned routing
- `runner.py`: add `gpu_devices` field to `RunConfig` (overrides `--gpus all` with `--gpus device=N`); add `GPUS_LABEL` tracking; `list_containers()` now returns `gpu_devices` list
- Add `server` optional extras: `fastapi>=0.110`, `uvicorn[standard]>=0.29`, `httpx>=0.27`
- Update CI to install `.[dev,aws,server]`
- `AgentClient` (`vllmd.agent.client`): async httpx wrapper for the agent API
- New packages: `vllmd.agent`, `vllmd.orchestrator`, `vllmd.cluster`

## v1.2.2 — 2026-05-13
- Fix CI: install `.[dev,aws]` so boto3/botocore are available for S3 provider tests

## v1.2.1 — 2026-05-13
- Fix ruff lint errors introduced in v1.2.0: remove redundant string quotes from type annotations (UP037), sort `__all__` (RUF022), remove unused `noqa` directive (RUF100), remove unused `pytest` import (F401)

## v1.2.0 — 2026-05-13
- Add pluggable session storage: `BaseSessionStore` ABC with `LocalSessionStore` (existing behaviour) and `S3SessionStore` (boto3)
- Add `S3VectorStore`: wraps `LocalVectorStore` (ChromaDB) and syncs the database archive to/from S3 after each write
- Add `sessions/providers/` subpackage mirroring the `vectordb/providers/` pattern
- Add `get_session_store()` factory reading `sessions.store` from the project/global config
- Update `config.example.yaml` with `sessions` block and `vectordb.backend: s3` example
- `Session.save/load/list_all/delete` now accept a `BaseSessionStore` instead of a `Path`; callers use `LocalSessionStore(path)` to retain prior behaviour

## v1.1.2 — 2026-05-12
- Switch config file format from TOML to YAML, matching the agent-tester config pattern
- Global config: `~/.config/vllmd/config.yml` (or `.yaml`)
- Local config: auto-detected from `vllmd.yaml`, `vllmd.yml`, `.vllmd.yaml`, `.vllmd.yml`
- Add `config.example.yaml`
- Replace `tomli` dependency with `PyYAML>=6.0`

## v1.1.1 — 2026-05-12
- Move provider backends (`base.py`, `local.py`, `aws.py`) into `vectordb/providers/` subpackage
- Add `get_vector_store()` factory that reads backend from `.vllmd.toml` (project) or `~/.vllmd/config.toml` (global), defaulting to `LocalVectorStore`
- Add `tomli>=2.0` as a conditional dependency for Python < 3.11

## v1.1.0 — 2026-05-12
- Add `BaseVectorStore` ABC defining the full vector store interface
- Add `AWSVectorStore` backed by Amazon OpenSearch Service (`pip install 'vllmd[aws]'`)
- Refactor `VectorStore` → `LocalVectorStore(BaseVectorStore)`; `VectorStore` alias kept for backward compatibility
- Move shared helpers (`_chunk_text`, `_file_id`) and constants to `vectordb.base`
- Add `[aws]` optional extras: `opensearch-py>=2.0`, `boto3>=1.34`

## v1.0.1 — 2026-05-12
- Deduplicate message deserialization via `_parse_messages()` in `load()` and `list_all()`
- Move embedder factory to `embeddings.make_embedder()`, remove duplicate closures in CLI and chat
- Extract `_post_json()` into `embeddings.py`, use it across chat completion and embedding calls
- Extract `_load_session_or_exit()` to deduplicate session error handling across five commands
- Extract `_ingest_chunks()` to deduplicate `ingest_document` and `ingest_code_file`
- Fix CI publish workflow: move `id-token: write` to workflow level for trusted publishing

## v1.0.0 — 2026-05-12
- Initial stable release
