Metadata-Version: 2.4
Name: adra
Version: 0.4.0
Summary: ADRA — Adversarial Dev Review Agent: a deterministic-first, adversarial-validation engine for the software lifecycle (PR review, validation/refutation experiments, auto-docs, human escalation).
Author: Felipe Santibáñez-Leal
License: Apache-2.0
Keywords: llm,agent,code-review,adversarial-validation,llm-as-judge,devops,provenance
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: llm
Requires-Dist: pydantic-ai-slim[anthropic,google,groq,openai]<2,>=1; extra == "llm"
Provides-Extra: github
Requires-Dist: httpx>=0.27; extra == "github"
Provides-Extra: azuredevops
Requires-Dist: httpx>=0.27; extra == "azuredevops"
Provides-Extra: databricks
Requires-Dist: databricks-sdk>=0.30; extra == "databricks"
Provides-Extra: azure
Requires-Dist: azure-identity>=1.17; extra == "azure"
Requires-Dist: azure-monitor-query>=1.3; extra == "azure"
Provides-Extra: all
Requires-Dist: pydantic-ai-slim[anthropic,google,groq,openai]<2,>=1; extra == "all"
Requires-Dist: httpx>=0.27; extra == "all"
Requires-Dist: databricks-sdk>=0.30; extra == "all"
Requires-Dist: azure-identity>=1.17; extra == "all"
Requires-Dist: azure-monitor-query>=1.3; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Requires-Dist: httpx>=0.27; extra == "dev"
Requires-Dist: pydantic-ai-slim[anthropic]<2,>=1; extra == "dev"
Requires-Dist: databricks-sdk>=0.30; extra == "dev"
Requires-Dist: azure-identity>=1.17; extra == "dev"
Requires-Dist: azure-monitor-query>=1.3; extra == "dev"
Dynamic: license-file

# ADRA — Adversarial Dev Review Agent

> A **client-agnostic, deterministic-first, adversarial-validation engine** that supports
> the software lifecycle: it reviews PRs, designs and runs validation/refutation
> experiments, writes documentation back, and **escalates to a human** exactly where a
> senior engineer would. Governed by **LLM-as-Judge + a blocking adversarial critic**,
> grounded by **deterministic tools**, with **immutable provenance**. Runs **offline with
> no API key**.

`pip install adra` · Python ≥ 3.11 · Apache-2.0 · status: `v0.04.000`

---

## What's different

The AI-code-review market splits in two, and both miss the same spot:

- **Reviewers** (CodeRabbit, Greptile, Qodo, Korbit, Sourcery, Bito) feed linters into an
  LLM, but the **model's prose is the verdict** — hallucinated and "consistently-stated-
  but-false" findings leak through; the deterministic signals are inputs, never the gate.
- **Autonomous coders** (Devin, OpenHands, SWE-agent, Sweep) *write* code and treat
  "tests pass" as success rather than adversarially trying to prove the change **wrong**.

ADRA occupies the gap: a **deterministic spine** (git / CI / static analysis / SQL probes)
that *grounds* a **blocking adversarial critic** whose job is to **refute**, not bless, each
artifact — every finding carrying its evidence, with disciplined **human escalation** when
nothing deterministic backs the verdict. Existing tools generate opinions; ADRA generates
proofs and refutations, and escalates when it can't.

## The six capabilities

| Skill | What it does |
|---|---|
| `code_review` | Review a diff: language/leak scan + test-discoverability + exact CI command + semantic findings |
| `pr_eval` | Evaluate a PR: merge-base health → `bundle validate` → conformance → verdict + PR body |
| `experiment` | Hypothesis-driven validation experiment: SQL-warehouse probes + synthesis |
| `improve` | Minimum-functional improvement proposal (prune filler, smallest safe diff) |
| `document` | Turn a run record into a PR page / experiment page / methodology-history row |
| `decide` | Route analysis: candidate routes + trade-offs + recommendation — **human-owned** |

Each skill is the same loop, differing only by its domain prompt and deterministic tools.

## Why deterministic-first

Tools (`git`, the exact CI command, `bundle validate`, language scan, SQL probe) run
**first** and become both the grounding the model may not contradict and the evidence in
the provenance log. Because the deterministic floor carries the verdict, the whole loop
runs — and the test suite passes — **offline with no API key**. Connecting a real provider
adds the semantic layer on top.

## Architecture

```
intake ─▶ plan ─▶ ground (deterministic tools) ─▶ generate ─▶ CRITIC ─┐
                                                      ▲   revise ◀──────┘
                                                      └── accepted / escalate ─▶ artifacts + run record
```

- `adra/state.py` — the typed **domain model** (`Severity`, `Finding`, `ToolResult`,
  `CriticVerdict`, `RunState`). One contract end to end.
- `adra/rubric.py` — the shared adversarial **rubric** (criteria as typed data); drives both
  the deterministic critic and the critic prompt, so "what we check" never drifts.
- `adra/orchestrator.py` — the hand-rolled, **framework-free** state machine.
- `adra/critic.py` — deterministic red-team pass (rubric-driven) + LLM semantic attacks.
- `adra/judge.py` — rubric scoring with **swap-and-average** + reference anchoring.
- `adra/llm.py` — the tiny ADRA-owned `ChatModel` seam: `mock` (offline) | any real provider
  via **pydantic-ai** (`provider:model`, config-only). No LangChain/LangGraph.
- `adra/tools/` — each returns a `ToolResult` (git / CI / bundle / lang / discovery / sql).
- `adra/skills/` — the `Skill` base + the six skills.
- `adra/clients/` — client governance suites (the bundled fictional **Northwind Data
  Platform**); selectable via `ADRA_CLIENT_DIR`.
- `adra/provenance.py` — the immutable run record (the deep change-history layer).
- `cli/` — the `adra` command. `refs/` — annotated bibliography + papers. `docs/` — deep
  docs + engine ADRs (`docs/adr/`).

## Quickstart (offline, no key)

```bash
python -m venv .venv && . .venv/Scripts/activate     # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
pytest -q                                            # 11 passing, fully offline
python scripts/demo_offline.py                       # end-to-end demo, all six skills
```

Expected: the stale-base PR is **blocked + escalated** (12 commits behind, a notebook
deletion, a `.yml → .yml.t` rename dropping a bundle resource, `bundle validate` failing);
the language/leak review is **blocked**; the clean experiment / improve / document / decide
runs are **accepted**.

```bash
adra review path/to.diff --ci-command 'python -m coverage run -m unittest discover -s . -p "test*.py"'
adra decide "Raise the refresh cadence" "edit the shared CI template" "change it in the owning repo"
```

## Enable a real provider

```bash
pip install -e ".[llm]"                  # pydantic-ai; the seam is provider-agnostic
export ANTHROPIC_API_KEY=...             # or put it in .env (any provider's key works)
export ADRA_PROVIDER=anthropic           # adding a provider is config only: ADRA_PROVIDER / ADRA_MODEL / ADRA_MODEL_<ROLE>
adra pr-eval --source task/123/x --repo /path/to/repo --external
```

`--external` (or `ADRA_ALLOW_EXTERNAL=1`) lets the tools actually run git / the CI command.
Default is **dry-run / read-only**.

## Client-agnostic grounding

A *client* = a governance suite (conventions, ADRs, CI standards, glossary, incident cases)
the engine grounds on. ADRA ships a complete, **fictional** client — **Northwind Data
Platform** — under `adra/clients/synthetic/northwind/`. Point ADRA at any client:

```bash
export ADRA_CLIENT_DIR=/path/to/your/standards   # or Settings(client_dir=...)
```

The rubric references the suite by id and the prompts cite it — the engine code does not
change per client.

## Connectors & emulator

The engine grounds through one connector `Protocol` so the same skills run against a real
platform or a synthetic one:

- **Real**: GitHub (PRs/reviews/issues/contents via a thin `httpx` REST v3 client), Azure DevOps
  (REST 7.1), Databricks (`databricks-sdk` + bundles CLI), Azure (`azure-identity` + monitor / health).
- **Emulator**: a self-contained platform (synthetic git repos + PRs + wiki + boards + CI +
  a SQLite warehouse) so the full flow runs offline.

> The GitHub, Azure DevOps, Databricks, and Azure connectors are **implemented**; the offline
> emulator runs the full flow with no external calls. Each is enabled by its `pip install adra[...]`
> extra and exercised read-only by default.

## Security model

Deterministic floor (tools are ground truth; the LLM cannot overturn a blocker) · read-only
by default (writes require `--external` **and** explicit human confirmation) · human gates on
PR create / push / merge and any risk claim · English-only + AI-authorship-leak scan on
anything written to disk · immutable provenance for every run. The agent reads **untrusted**
repo/PR/issue content — a dual-LLM / capability split + sandboxed, egress-filtered execution are
planned for the connector phase (not yet implemented; OWASP LLM/Agentic Top-10).

## Two-repo layout

ADRA is the **public-destined OSS engine** (this repo) — no secrets, ever. A separate
private **ADRA Console** (a private web app + backend) *consumes* this engine for
experiments and real connections behind access control. The engine is the serious tool you
can run anywhere with your own tokens; the console is the connected instance.

## Extending

- **New criterion:** add a `RubricItem` to `adra/rubric.py` (it shows up in the critic
  prompt and, if `kind="deterministic"`, wire its check in `critic.py`).
- **New capability:** add a `Skill` subclass + a `prompts/<skill>.md`, register it in
  `adra/skills/__init__.py`, add a `Node`.
- **New tool:** a function returning a `ToolResult`; call it from a skill's `ground`.
- **New provider:** config only — set `ADRA_PROVIDER` / `ADRA_MODEL` / `ADRA_MODEL_<ROLE>`
  (pydantic-ai resolves the `provider:model`); no new code.

## Status

`v0.04.000` — the engine is complete and green offline, and the GitHub / Azure DevOps / Databricks
/ Azure connectors plus the offline emulator are implemented. Multi-industry synthetic clients and
the web console are the next phases. Stays `0.x` while connectors are partly untested-live.

## License

Apache-2.0 — see [LICENSE](LICENSE).

## References

`refs/` holds an annotated bibliography (`refs/README.md`) + BibTeX (`refs/references.bib`)
+ the core papers (ReAct, Reflexion, Self-Refine, Constitutional AI, LLM-as-judge bias,
agentic security / CaMeL, NIST AI RMF, OWASP LLM/Agentic, provenance, code-review agents).
