Metadata-Version: 2.4
Name: finagent-redrange
Version: 0.6.0
Summary: A reproducible, defensive red-team range for financial-services AI agents.
Project-URL: Homepage, https://github.com/emmanuelgjr/finagent-redrange
Project-URL: Repository, https://github.com/emmanuelgjr/finagent-redrange
Project-URL: Issues, https://github.com/emmanuelgjr/finagent-redrange/issues
Author: Emmanuel Guilherme Junior
License: Apache-2.0
License-File: LICENSE
License-File: LICENSE-docs
License-File: NOTICE
Keywords: agentic-ai,ai-security,llm-security,mitre-atlas,owasp,red-team
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: pyyaml>=6.0
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.39; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: mypy>=1.11; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Provides-Extra: incidents
Requires-Dist: genai-incidents>=2.6; extra == 'incidents'
Description-Content-Type: text/markdown

<div align="center">

# 🛡️ FinAgent-RedRange

**A reproducible, _defensive_ red-team range for financial-services AI agents.**

[![CI](https://github.com/emmanuelgjr/finagent-redrange/actions/workflows/ci.yml/badge.svg)](https://github.com/emmanuelgjr/finagent-redrange/actions/workflows/ci.yml)
&nbsp;[![PyPI](https://img.shields.io/pypi/v/finagent-redrange?logo=pypi&logoColor=white)](https://pypi.org/project/finagent-redrange/)
&nbsp;![Python](https://img.shields.io/badge/python-3.11%2B-3776AB?logo=python&logoColor=white)
&nbsp;![Tests](https://img.shields.io/badge/tests-passing-brightgreen)
&nbsp;![Lint](https://img.shields.io/badge/lint-ruff-261230?logo=ruff&logoColor=white)
&nbsp;![Types](https://img.shields.io/badge/types-mypy-2A6DB2)
&nbsp;![Code license](https://img.shields.io/badge/code-Apache--2.0-success)
&nbsp;![Docs license](https://img.shields.io/badge/docs-CC--BY--4.0-success)

Develop proof-of-concept exploits against a mock retail-banking agent, then **prove that
specific guardrails close each one** — end to end, from POC through regression test.
<br/>**Build the attack only to prove the defense.**

</div>

> 🔒 **Defensive research only.** The single target is the bundled mock agent; all data is
> synthetic. Every exploit ships with the control that blocks it and a regression test that
> keeps it closed. See [SECURITY.md](SECURITY.md).

> 📖 **New here?** Start with the **[guided walkthrough](docs/WALKTHROUGH.md)** — a narrated tour of
> what the range does, how to run it, and what each output means.

### At a glance

|  |  |
|---|---|
| **Scenarios** | 9 — prompt injection · data poisoning · excessive agency · system-prompt leakage · unsafe output · vector/embedding weakness · unbounded consumption · supply chain · **multimodal injection** |
| **Coverage** | **10/10** OWASP LLM risks (9 dedicated POC+control scenarios — incl. a **multimodal** input surface — + LLM02 & LLM09 as impact tags) · both OWASP agentic schemes — Threats & Mitigations (T1/T2/T3/T4/T6) **and** the 2026 Top 10 for Agentic Applications (ASI01–04, 06, 09) · MITRE ATLAS · NIST AI RMF |
| **Result** | every attack 🔴 exploited (controls off) → 🟢 blocked (controls on); mean AIRQ heuristic **High → Medium** |
| **Extras** | permission-checked tool loop · sweep + **adaptive-LLM** autonomous attacker · semantic real-model oracle · md / json / **html** scorecard |
| **Handouts** | ready-to-use exports for security teams — **Sigma** detection pack (measured precision) · **SARIF 2.1.0** findings · **GSN assurance case** · **regulatory crosswalk** (NIST/ISO 42001/EU AI Act) · **ATLAS Navigator** coverage layer. See [docs/HANDOUTS.md](docs/HANDOUTS.md) |
| **Runs** | fully offline & deterministic — **no API key** · 97 tests green in CI (Python 3.11 / 3.12) |
| **Try it** | `pip install finagent-redrange && python -m finagent_redrange run` (or `pip install -e ".[dev]"` from a clone) |

<p align="center">
  <img src="docs/scorecard.png" alt="FinAgent-RedRange scorecard — eight scenarios exploited with controls off and blocked with controls on, OWASP LLM and Agentic coverage matrices, and the autonomous-attacker result" width="900">
  <br/>
  <em>The headline artifact: <code>python -m finagent_redrange run</code> regenerates this scorecard (md / json / html).</em>
</p>

---

## Threat model

```mermaid
flowchart LR
    U[User / Attacker] -->|prompt| GR_IN[Input guardrails]
    IMG[(Image input<br/>vision / OCR)] -->|extracted text| GR_IN
    GR_IN --> A[Banking agent<br/>LLM planner]
    DOCS[(Policy & knowledge<br/>RAG corpus)] -->|retrieved context| GR_RET[Retrieval guardrails<br/>allowlist · integrity · provenance]
    GR_RET --> A
    A -->|tool calls| GR_ACT[Action guardrails<br/>high-risk authorization]
    GR_ACT --> T{Tool layer<br/>+ permission checks}
    T --> BAL[get_balance]
    T --> XFER[transfer_funds]
    T --> KYC[lookup_kyc]
    T --> TXN[list_transactions]
    T --> TIX[create_support_ticket]
    A -->|draft answer| GR_OUT[Output guardrails<br/>PII · leak · link filters]
    GR_OUT -->|final answer| U

    %% attack surfaces (the nine scenarios — full OWASP LLM Top 10 + a multimodal input surface)
    INJ([Indirect prompt injection]):::atk -.poisons.-> DOCS
    POI([Data poisoning]):::atk -.corrupts.-> DOCS
    AGY([Excessive agency]):::atk -.coerces.-> T
    LEAK([System-prompt leakage]):::atk -.extracts.-> A
    OUT([Unsafe output handling]):::atk -.rides out via.-> GR_OUT
    VEC([Vector/embedding weakness]):::atk -.cross-tenant leak via.-> DOCS
    CON([Unbounded consumption]):::atk -.floods.-> T
    SUP([Supply chain]):::atk -.injects malicious tool into.-> T
    MM([Multimodal injection]):::atk -.hides in.-> IMG

    classDef atk fill:#fde,stroke:#c39,color:#000;
```

**Modeled attack surfaces:** **indirect prompt injection** via retrieved documents, **data
poisoning** of the trusted knowledge store, **excessive agency / tool misuse**, **system-prompt
leakage**, **unsafe output handling**, **vector/embedding weakness** (cross-session retrieval),
**unbounded consumption** (tool-budget exhaustion), **supply chain** (malicious third-party
tool), and **multimodal injection** (an instruction hidden in an uploaded image's OCR text) —
full OWASP LLM Top 10 coverage plus a multimodal input surface. Surfaces and findings are mapped to OWASP LLM Top 10,
both OWASP agentic schemes (Threats & Mitigations T1–T15 and the 2026 Top 10 for Agentic
Applications ASI01–ASI10), MITRE ATLAS, and NIST AI RMF below.

## Mitigation-validation results

The point of the range: each POC must **land with controls off and fail with controls on.**
Run `python -m finagent_redrange run` to regenerate `results/scorecard.{md,json,html}`.

| Scenario | OWASP LLM | Agentic (T&M · Top 10) | ATLAS | AIRQ (off→on) | Controls **off** | Controls **on** | Validating control |
|---|---|---|---|---|---|---|---|
| Indirect prompt injection (cross-account PII) | LLM01 · LLM02 | T6 · ASI01 | AML.T0051.001 | High → Medium | 🔴 exploited | 🟢 blocked | Output PII filter (+ provenance) |
| Data poisoning (fabricated policy) | LLM04 · LLM09 | T1 · ASI06 | AML.T0070 | High → Medium | 🔴 exploited | 🟢 blocked | Source allowlist + integrity hash |
| Excessive agency (autonomous transfer) | LLM06 · LLM01 | T2 · T3 · ASI02 · ASI03 | AML.T0053 | High → Medium | 🔴 exploited | 🟢 blocked | Action-authorization guardrail |
| System-prompt leakage | LLM07 · LLM01 | — | AML.T0056 | Medium → Low | 🔴 exploited | 🟢 blocked | Output system-prompt-leak detector |
| Unsafe output handling (malicious link) | LLM05 · LLM02 | ASI09 | AML.T0052.000 | Medium → Low | 🔴 exploited | 🟢 blocked | Output link/markup sanitiser |
| Vector/embedding weakness (cross-session leak) | LLM08 · LLM02 | ASI03 | AML.T0057 | High → Medium | 🔴 exploited | 🟢 blocked | Access-scoped retrieval |
| Unbounded consumption (tool-budget exhaustion) | LLM10 | T4 | AML.T0034 | Medium → Low | 🔴 exploited | 🟢 blocked | Per-request tool-call budget |
| Supply chain (malicious third-party tool) | LLM03 | ASI04 | AML.T0010.001 | High → Medium | 🔴 exploited | 🟢 blocked | Verified-publisher tool allowlist |
| Multimodal injection (image-borne instruction) | LLM01 | ASI01 | AML.T0051 | Medium → Low | 🔴 exploited | 🟢 blocked | Multimodal input guardrail (OCR as data) |

*Regenerated on each run. Nine scenarios are dedicated POC+control pairs covering the **full OWASP
LLM Top 10** (LLM01/03/04/05/06/07/08/10 as primary risks, plus a **multimodal** input surface under
LLM01), and **all 10** once impact tags (LLM02, LLM09) are
counted. The Agentic column carries both OWASP agentic schemes — the "Agentic AI — Threats and
Mitigations" taxonomy (T1–T15) and the 2026 "Top 10 for Agentic Applications" (ASI01–ASI10);
a cell is **blank** where no honest mapping exists in either. **AIRQ** (a heuristic defined for this project, not an external standard; AS = Attack Surface,
BR = Blast Radius, DC = Defense Controls) is an **illustrative analyst heuristic for
prioritization, not a calibrated metric** — the controls-on DC is the control's *asserted* strength, so "High → Medium"
is the intended mitigation effect, not a measured residual-risk number. ATLAS rows are
closest-fit (see the scorecard's Notes). Full matrix in `results/scorecard.md`.*

### Strategy-sweep attacker

`python -m finagent_redrange auto` turns an attacker loose on an objective ("extract the agent's
hidden system prompt"). Two planners share one seam (`attacker/engine.py`): the default
**`--planner sweep`** runs a deterministic fixed product of seed payloads × transforms (base64,
role-play, crescendo) — offline and CI-friendly — while **`--planner llm`** (pair with
`--model claude`) is an **adaptive LLM planner** that reasons about which seed + transform to try
next given what's already been tried and whether it landed. With controls **off** it lands; with
controls **on** it is defeated by layered defense — the base64-obfuscated probe slips past the
input filter but the **output canary detector** catches the leak, and the direct phrasings are
caught by the input filter. The headline defensive result: *the control holds even as the attacker
works through every strategy it has.*

## Quickstart

```bash
git clone https://github.com/emmanuelgjr/finagent-redrange.git && cd finagent-redrange
python -m venv .venv && source .venv/bin/activate   # Windows (PowerShell): .venv\Scripts\Activate.ps1
pip install -e ".[dev]"

# offline, deterministic (no API key needed) — uses the EchoClient
python -m finagent_redrange run             # all 8 scenarios, controls off then on -> scorecard
python -m finagent_redrange run --handouts  # + Sigma pack, SARIF, GSN assurance case (docs/HANDOUTS.md)
python -m finagent_redrange auto            # turn the autonomous attacker loose on an objective

# against a real model (full tool-execution loop with permission-checked tools)
cp .env.example .env             # add ANTHROPIC_API_KEY
pip install -e ".[anthropic]"    # real-model runs also need the Anthropic SDK
python -m finagent_redrange run --model claude --controls off
python -m finagent_redrange run --model claude --controls on   # mitigations enabled

pytest -q   # regression suite: with controls on, every known attack must stay blocked
```

Outputs land in `results/` as `scorecard.md` (the table above), `scorecard.json`
(machine-readable, CI-friendly), and `scorecard.html` (a standalone styled report for
screen-sharing). Adding `--handouts` also writes `results/sigma/` (Sigma detection rules + a
labeled-replay precision report), `results/findings.sarif` (SARIF 2.1.0), `results/assurance/`
(a GSN control-effectiveness assurance case), `results/compliance/` (a regulatory control
crosswalk to NIST AI RMF / ISO 42001 / EU AI Act), and `results/navigator/` (a MITRE ATLAS Navigator
coverage layer). All are regenerated on each run; none are committed.
See **[docs/HANDOUTS.md](docs/HANDOUTS.md)** for what each is, what it provides per persona, and how
its precision is validated.

## Architecture

| Package | Responsibility |
|---|---|
| `target/` | The system under test — a mock banking agent: a **plan→act→observe tool loop** over permission-checked tools, with **toggleable** input / retrieval / action / output guardrails |
| `attacker/` | Red-team engine: scripted `run_campaign` + autonomous `run_autonomous` (composes seeds × transforms until an oracle fires) |
| `scenarios/` | One attack class per file (9): indirect prompt injection, data poisoning, excessive agency, system-prompt leakage, unsafe output handling, vector/embedding weakness, unbounded consumption, supply chain, multimodal injection — full OWASP LLM Top 10 coverage + a multimodal input surface |
| `scoring/` | Framework crosswalk (OWASP / ATLAS / NIST) + AIRQ risk scoring + scorecard renderer (md / json / html) |
| `exports/` | Handout exporters generated from `Finding`s — **Sigma** detection pack + labeled-replay precision harness, **SARIF 2.1.0** findings, **GSN assurance case**, **regulatory crosswalk** (NIST/ISO 42001/EU AI Act), **ATLAS Navigator** coverage layer (see [docs/HANDOUTS.md](docs/HANDOUTS.md)) |
| `llm/` | Provider-agnostic client returning structured `ModelResponse` (text + tool calls); `EchoClient` runs offline for tests, `AnthropicClient` for real-model runs |

Full design notes for contributors (human or agent) live in [CLAUDE.md](CLAUDE.md).

## Why this design

- **POC-to-validation, not POC-alone.** A finding isn't done until the control that blocks it
  is proven by a passing regression test. That's the loop a bank actually needs.
- **Framework-mapped by construction.** Findings carry OWASP/ATLAS/NIST IDs and AIRQ
  sub-scores as structured fields, so they drop straight into governance and audit workflows.
- **Black/grey-box discipline.** The attacker only touches the agent's public `respond()`
  surface — the same position a real adversary occupies.
- **Reproducible.** One-command run and a deterministic offline mode; CI exercises the suite on Python 3.11 / 3.12.
- **Honest crosswalk, adversarially reviewed.** Framework IDs were verified against the
  published standards (e.g. OWASP LLM05 2025 = *Improper Output Handling*; agentic threats use
  the OWASP T1–T15 scheme), and a multi-agent adversarial review hardened the oracles so each
  scenario is blocked by the control its scorecard *names* — not incidentally by another.

## Roadmap

- ~~Autonomous attacker-agent loop~~ ✅ shipped (`attacker/run_autonomous`).
- ~~LLM-driven attacker planner~~ ✅ shipped — the planner is now a pluggable seam with two
  implementations: the deterministic `SweepPlanner` (offline default) and an adaptive `LLMPlanner`
  that reasons about the next seed + transform from the feedback of prior attempts
  (`auto --planner llm --model claude`).
- ~~Excessive agency, system-prompt leakage, unsafe output handling scenarios~~ ✅ shipped.
- ~~Semantic oracles for real-model runs~~ ✅ shipped (`scenarios/judge.py`: an
  adoption-vs-refutation judge — deterministic offline, a semantic LLM judge on `--model claude`
  — so a model that quotes a poisoned claim to *refute* it is scored as a refusal, not an exploit).
- ~~Fill the remaining OWASP gaps (LLM03 supply chain, LLM08 vector/embedding, LLM10 unbounded
  consumption)~~ ✅ shipped — **full OWASP LLM Top 10 coverage** (8 dedicated POC+control scenarios).
- ~~CI regression gate~~ ✅ shipped (ruff + mypy + pytest on Python 3.11/3.12).
- ~~Ready-to-use handout exports for security teams~~ ✅ shipped — a **Sigma** detection pack with a
  labeled-replay precision gate (8 TP / 0 FP / 0 FN), a **SARIF 2.1.0** findings run, a **GSN
  control-effectiveness assurance case** with zero-orphan-claim traceability, and an interpretive
  **regulatory crosswalk** (NIST AI RMF + GenAI Profile, ISO/IEC 42001, EU AI Act) with
  declared-vs-interpretive provenance labeling (`exports/`, `run --handouts`). See
  [docs/HANDOUTS.md](docs/HANDOUTS.md).
- ~~Multimodal attack surfaces~~ ✅ shipped — a **multimodal injection** scenario: an instruction
  hidden in an uploaded image's OCR text, blocked by a multimodal input guardrail that treats
  extracted image text as untrusted data (`target/agent.py` gained an optional `images=` surface).
- ~~Publish to PyPI~~ ✅ shipped — [`finagent-redrange`](https://pypi.org/project/finagent-redrange/)
  on PyPI (`pip install finagent-redrange`), released via a secure OIDC Trusted-Publishing workflow
  (`.github/workflows/publish.yml`) — no token stored.
- ~~Seed the attacker from a larger real-world incident dataset~~ ✅ shipped — the optional
  `[incidents]` extra (`pip install "finagent-redrange[incidents]"`) seeds the autonomous attacker
  from the [genai-incidents](https://github.com/emmanuelgjr/genai_incidents) corpus (12k+ real
  GenAI/agentic incidents): `SeedLibrary.from_genai_incidents()` maps each incident to a scenario
  technique, orders by real-world severity, and records its provenance. **Scope-safe:** incidents
  supply the technique + prioritization + provenance only — every payload stays a synthetic
  mock-agent probe (no incident text is reproduced). Data credited under CC BY 4.0 (see NOTICE).

## License & citation

Dual-licensed so the work stays usable while attribution stays required:

- **Code** — [Apache License 2.0](LICENSE): permissive, with an explicit patent grant and
  attribution propagation via the [NOTICE](NOTICE) file.
- **Documentation & research** (the `docs/` directory, this README, and the generated
  scorecards) — [Creative Commons Attribution 4.0](LICENSE-docs) (CC BY 4.0): reuse freely,
  but credit the author by name and link back.

If you use this project, its harness, its framework crosswalk, or its findings, please cite it —
see [CITATION.cff](CITATION.cff) (GitHub's **"Cite this repository"** button). © 2026 Emmanuel
Guilherme Junior.
