Metadata-Version: 2.4
Name: agentversion
Version: 0.1.0
Summary: An open specification for versioning agent runtimes and keeping datasets valid.
Project-URL: Homepage, https://github.com/decimal-labs/agentversion
Project-URL: Documentation, https://github.com/decimal-labs/agentversion/tree/main/spec
Project-URL: Repository, https://github.com/decimal-labs/agentversion
Project-URL: Issues, https://github.com/decimal-labs/agentversion/issues
Author: Decimal AI
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: agent,agentversion,ai,compatibility,drift,llm,spec,version,versioning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: jcs>=0.2
Requires-Dist: pydantic>=2.0
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: jsonschema>=4.0; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# AgentVersion

**Your agent changed. Is your saved data still valid?**

`agentversion` turns an agent version into a diffable, hashable contract — so when prompts, tools, models, or graphs change, you know exactly what broke and which traces, eval sets, and training data survived.

[![CI](https://github.com/decimal-labs/agentversion/actions/workflows/ci.yml/badge.svg)](https://github.com/decimal-labs/agentversion/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/agentversion)](https://pypi.org/project/agentversion/)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.org/project/agentversion/)
[![Spec](https://img.shields.io/badge/spec-v1.0-success)](https://github.com/decimal-labs/agentversion/tree/main/spec)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/decimal-labs/agentversion/blob/main/LICENSE)

When you ship a new version of an agent, everything you collected against the old one — production traces, eval datasets, SFT examples — quietly drifts out of date. There's no `package.json` to pin an agent's contract, and no `git diff` to tell you what changed. `agentversion` is that missing format: a JSON **manifest** describing an agent version, a **diff** that classifies every change as breaking or non-breaking, and a **compatibility decision** that tells you whether to keep, repair, replay, or drop your old data.

It's a dependency-light Python package with a CLI — and an open spec any tool can implement.

---

## See it in action

Two production manifests of the same `finance-agent`, `v1` and `v2`. One command:

```console
$ agentversion diff finance-agent-v1.json finance-agent-v2.json --compat
```

```
                                     Manifest Diff
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Surface         ┃ Change Type  ┃ Details                                             ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ environment     │ non_breaking │ environment added                                   │
│ model_runtime   │ breaking     │ provider: 'google' → 'openai'                       │
│                 │              │ runtime_version: 'app-runtime@1.5.0' →              │
│                 │              │ 'app-runtime@1.8.2'                                 │
│                 │              │ envelope changed                                    │
│ output_contract │ breaking     │ format: 'text' → 'json'                             │
│                 │              │ strict: False → True                                │
│                 │              │ output schema changed                               │
│ prompt_stack    │ non_breaking │ system_prompt hash changed                          │
│                 │              │ developer_prompt hash changed                       │
│ subagents       │ breaking     │ subagents added: ['finance_subagent',               │
│                 │              │ 'spreadsheet_subagent']                             │
│ tool_registry   │ breaking     │ search_population removed                            │
│                 │              │ get_population added                                │
│                 │              │ write_spreadsheet_cell added                        │
│                 │              │ get_market_cap modified (non-schema)                │
│ workflow        │ breaking     │ graph topology changed                              │
│                 │              │ routing_policy_version: '2' → '4'                   │
│                 │              │ graph_version: '3' → '6'                            │
│                 │              │ graph_name: 'finance-simple-graph' →                │
│                 │              │ 'finance-router-graph'                              │
└─────────────────┴──────────────┴─────────────────────────────────────────────────────┘

  Breaking: 5  Non-breaking: 2

  Recommendation: replay
  Breaking changes in model_runtime, output_contract, subagents, tool_registry,
  workflow — existing data should be replayed against the new agent version.
```

Between v1 and v2 the team swapped the model (Google → OpenAI), renamed a tool, added two subagents, and switched to strict JSON output. `agentversion` caught all five breaking surfaces and told you the old traces need a replay — not a guess, a classification you can gate CI on.

> Try it yourself — both manifests live in [`examples/manifest/`](https://github.com/decimal-labs/agentversion/tree/main/examples/manifest).

---

## Why an agent needs a version contract

You probably already have observability and a trace store. None of them answer *"what is this agent version, and is my old data still compatible with the new one?"*

| You already have | What it gives you | What it doesn't |
|---|---|---|
| OpenTelemetry / LangSmith / Langfuse | rich execution traces | a versioned contract for the agent that produced them |
| A2A / ACP agent cards | runtime discovery + I/O types | version identity or data-compatibility |
| OpenAI JSONL / SFT files | a training format | provenance — *which agent version* produced each row |

**Isn't this A2A?** No — and they compose. A2A and ACP answer *"how does Agent A discover and talk to Agent B?"*. `agentversion` answers *"what changed in this agent, and what does that mean for my data?"*. An A2A Agent Card can carry an `agentversion` manifest hash so you know both at once.

---

## Install

```bash
pip install agentversion
```

Apache-2.0, no config — just needs Python 3.10+. It implements the frozen **v1.0 spec**, but the Python package itself is early: `0.1.0`, pre-1.0, with the API still settling.

## Quickstart

**Diff two versions** (table by default; add `--json` for machine output, `--compat` for a keep/repair/replay/drop recommendation):

```bash
agentversion diff old-manifest.json new-manifest.json --compat
```

**Gate breaking changes in CI** — `--fail-on-breaking` exits non-zero when any surface is breaking:

```yaml
# .github/workflows/agent.yml
- name: Block breaking agent changes
  run: agentversion diff baseline-manifest.json current-manifest.json --fail-on-breaking
```

**Scaffold, hash, and validate** a manifest:

```bash
agentversion init                     # interactively create a manifest
agentversion hash manifest.json       # canonical JCS-SHA256 identity hash
agentversion validate manifest.json   # check it against the spec
```

**Use it from Python** — every line below is exercised by the test suite:

```python
import json
from agentversion import AgentManifest, validate_manifest_file, hash_manifest
from agentversion.diff import diff_manifests
from agentversion.compatibility import classify_compatibility

old = json.load(open("finance-agent-v1.json"))
new = json.load(open("finance-agent-v2.json"))

# Validate + identify a version
assert validate_manifest_file("finance-agent-v2.json").valid
m = AgentManifest.model_validate(new)
print(m.agent_name, m.identity.overall_hash)   # finance-agent  sha256:767ebff1...

# Diff, then ask what to do with old data
result = diff_manifests(old, new)
print(result.summary.breaking_surfaces)                       # 5
print(classify_compatibility(result).recommended_decision)   # replay
```

---

## What's in the box

A typed reference implementation: Pydantic models, canonical hashing, the diff/compatibility algorithms, and a CLI.

**CLI**

| Command | What it does |
|---|---|
| `agentversion diff A B` | Classify changes by surface (`--json`, `--compat`, `--fail-on-breaking`) |
| `agentversion validate M` | Validate a manifest against the spec |
| `agentversion hash M` | Compute the canonical JCS-SHA256 hash |
| `agentversion init` | Scaffold a new manifest interactively |
| `agentversion upgrade M --to X` | Bump a manifest to a newer spec version |
| `agentversion {decision,replay,dataset} validate` | Validate the other spec objects |

**Library** — top-level `agentversion` exports `AgentManifest`, `validate_manifest` / `validate_manifest_file`, `hash_manifest` / `hash_surface`, and `SPEC_VERSION`. The algorithms live in `agentversion.diff` and `agentversion.compatibility`; the other spec models live in `agentversion.dataset`, `agentversion.replay`, and `agentversion.decision`.

The manifest is organized as a **contract surface** per component — `prompt_stack`, `model_runtime`, `tool_registry`, `skill_registry`, `workflow`, `subagents`, `output_contract`, `guardrails`, `context_config`, `environment` — each independently hashed so the diff is surface-level and precise.

---

## Use it anywhere — no platform required

The protocol is fully useful standalone:

1. **Track versions locally** — `init` to scaffold, `hash` for a stable id, `diff` between any two. No account, fully offline.
2. **Gate CI/CD** — `diff --fail-on-breaking` stops a breaking agent change from reaching production.
3. **Annotate traces** — stamp `identity.overall_hash` onto your OpenTelemetry spans as `agentversion.manifest_hash` for version-scoped filtering. See [`examples/integrations/otel_mapping.md`](https://github.com/decimal-labs/agentversion/blob/main/examples/integrations/otel_mapping.md).
4. **Classify data compatibility** — `diff --compat` (or `decision generate`) gives a per-episode keep / repair / replay / drop verdict you can act on.

It interoperates with LangSmith, Langfuse, Phoenix, and W&B — annotate their traces/datasets with a manifest hash, or read/write compatibility decisions alongside your eval pipeline.

---

## The spec & conformance

`agentversion` is an open spec so any tool, in any language, can produce interoperable manifests and diffs:

- [`spec/manifest.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/manifest.md) — the agent manifest
- [`spec/diff.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/diff.md) — surface diffs, breaking vs non-breaking
- [`spec/compatibility-decision.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/compatibility-decision.md) — keep / repair / replay / drop
- [`spec/replay.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/replay.md) · [`spec/dataset.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/dataset.md) — replay jobs and dataset objects with provenance
- [`spec/reference.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/reference.md) — full schemas and validation rules · [`schemas/`](https://github.com/decimal-labs/agentversion/tree/main/schemas) — JSON Schemas

[`CONFORMANCE.md`](https://github.com/decimal-labs/agentversion/blob/main/CONFORMANCE.md) + [`compatibility-tests/`](https://github.com/decimal-labs/agentversion/tree/main/compatibility-tests) are golden in/out pairs that any implementation must reproduce to claim conformance.

---

## Pairs with skillevaluation

A manifest can carry the eval results that gated its release in `evaluation.gates[]`:

```json
{
  "evaluation": {
    "gates": [
      { "name": "regression-suite", "threshold": 0.95, "actual_score": 0.972, "passed": true }
    ]
  }
}
```

Those scores come from [`skillevaluation`](https://github.com/decimal-labs/skillevaluation), the sibling open spec for A/B benchmarking skills. `agentversion` records *what an agent version is*; `skillevaluation` measures *whether it's better*.

The [`decimalai`](https://github.com/decimal-labs/decimalai-python) Python SDK builds on `agentversion` to add framework adapters (capture a manifest straight from your LangGraph/CrewAI app), trace capture, and managed replay — but you never need it to use the spec.

---

## Project

The **spec** is stable at v1.0 — frozen wire format and conformance suite. The **package** is `0.1.0`: pre-1.0 under semantic versioning, so the Python API may still shift before it catches up. Design decisions are logged in [`adrs/`](https://github.com/decimal-labs/agentversion/tree/main/adrs), releases in [`CHANGELOG.md`](https://github.com/decimal-labs/agentversion/blob/main/CHANGELOG.md). Contributions — especially new conformance cases — are genuinely welcome; see [`CONTRIBUTING.md`](https://github.com/decimal-labs/agentversion/blob/main/CONTRIBUTING.md):

```bash
git clone https://github.com/decimal-labs/agentversion
cd agentversion
pip install -e ".[dev]"
pytest
```

Licensed under [Apache 2.0](https://github.com/decimal-labs/agentversion/blob/main/LICENSE).
