Metadata-Version: 2.4
Name: chainreaper
Version: 1.1.0
Summary: Autonomous AI Supply Chain Attack Simulator by Breachline Labs
Project-URL: Homepage, https://github.com/BreachLine/chainreaper
Project-URL: Documentation, https://github.com/BreachLine/chainreaper#readme
Project-URL: Repository, https://github.com/BreachLine/chainreaper
Project-URL: Issues, https://github.com/BreachLine/chainreaper/issues
Project-URL: Changelog, https://github.com/BreachLine/chainreaper/blob/main/CHANGELOG.md
Author-email: Breachline Labs <security@breachline.io>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: attack-simulation,autonomous,cve,cyclonedx,dependency-confusion,devsecops,llm,sarif,sbom,sca,security,supply-chain,typosquatting
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.40.0
Requires-Dist: anyio>=4.4.0
Requires-Dist: google-generativeai>=0.8.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: openai>=1.55.0
Requires-Dist: pydantic-settings>=2.3.0
Requires-Dist: pydantic>=2.7.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: rich>=13.7.0
Requires-Dist: typer>=0.12.0
Provides-Extra: all
Requires-Dist: textual>=0.70.0; extra == 'all'
Requires-Dist: weasyprint>=62.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pre-commit>=3.7.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.5.0; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0.0; extra == 'dev'
Requires-Dist: vcrpy>=6.0.0; extra == 'dev'
Provides-Extra: pdf
Requires-Dist: weasyprint>=62.0; extra == 'pdf'
Provides-Extra: tui
Requires-Dist: textual>=0.70.0; extra == 'tui'
Description-Content-Type: text/markdown

# ChainReaper

**Autonomous AI Supply Chain Attack Simulator** by [Breachline Labs](https://breachline.io)

[![PyPI](https://img.shields.io/pypi/v/chainreaper.svg)](https://pypi.org/project/chainreaper/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://python.org)
[![Tests](https://img.shields.io/badge/tests-299%20passing-brightgreen.svg)]()
[![Vectors](https://img.shields.io/badge/attack%20vectors-13-red.svg)]()
[![Ecosystems](https://img.shields.io/badge/ecosystems-7-blue.svg)]()

---

The world's first **fully LLM-driven** supply chain security tool. Zero hardcoded rules -- an autonomous AI agent reasons about attack paths, simulates real supply chain attacks across 7 ecosystems, and generates interactive attack graph visualizations.

## Why ChainReaper?

| Feature | Traditional Scanners | ChainReaper |
|---------|---------------------|-------------|
| Analysis | Hardcoded rules + CVE matching | LLM reasons about attacks autonomously |
| Coverage | Known CVEs only | **13 attack vectors** including zero-day patterns |
| Attack Chains | Individual alerts | Chains findings into multi-step exploitation paths |
| Verification | None | LLM validates every finding, scores exploitability |
| Code Inspection | Metadata only | **Downloads and AST-parses** actual package source |
| Confidence | Binary yes/no | Calibrated against CVE data + threat intel |
| Visualization | Text lists | Interactive D3.js attack graph (zoom, click, search) |
| SBOM | No | CycloneDX 1.5 with linked vulnerabilities |
| CI/CD | Basic | GitHub Action, GitLab CI, PR comments, SARIF upload |
| False Positives | High | 3-layer anti-hallucination pipeline (CVE grounding + LLM verify + calibration) |

## Real-World Use Cases

### Pre-Merge Dependency Audit
Before merging a PR that adds new dependencies, catch risks that `npm audit` and Dependabot miss:
```bash
chainreaper scan . --format sarif --format json --ci --fail-on high
```
**What it catches:** typosquatting, dependency confusion, abandoned packages, install script attacks.

### Single Package Risk Assessment
Evaluate a package before adopting it:
```bash
chainreaper scan express@4.21.0 --format html --format json
chainreaper scan requests==2.31.0
chainreaper scan https://www.npmjs.com/package/lodash
```
**Real result (express@4.21.0):** 32 deps analyzed, 4 CVEs found, 13/13 analyzers pass, 5 attack paths, interactive D3.js graph -- $0.06.

### CI/CD Security Gate
Block deployments on critical supply chain risks:
```yaml
- uses: breachline/chainreaper@main
  with:
    target: "."
    format: "sarif,json,cyclonedx"
    fail-on: "high"
    api-key: ${{ secrets.GEMINI_API_KEY }}
```

### SBOM for Compliance (EO 14028)
```bash
chainreaper scan . --format cyclonedx --format json
# Generates CycloneDX 1.5 SBOM with components, PURLs, linked vulnerabilities
```

### Incremental Scanning in CI
Only scan changed deps, skip the full scan:
```bash
chainreaper scan . --diff-only --baseline ./baseline/chainreaper-results.json --ci
```

### Security Dashboard
Generate interactive attack graphs for security reviews:
```bash
chainreaper scan https://github.com/org/repo --format html
# D3.js graph: clickable nodes, attack path highlighting, severity colors, search, PNG export
```

### Policy Enforcement
```yaml
# .chainreaper-policy.yml
fail_on: high
block_rules:
  - name: block-critical-cves
    severity: [critical]
  - name: block-malicious
    attack_vectors: [malicious_package]
allow_rules:
  - name: accept-known-lodash-cves
    packages: ["lodash"]
    cve_ids: ["CVE-2020-28500"]
```
```bash
chainreaper scan . --ci --policy .chainreaper-policy.yml
```

## 13 Attack Vectors

| # | Vector | What It Detects | CWE |
|---|--------|----------------|-----|
| 1 | **Dependency Confusion** | Internal package names available on public registries | CWE-427 |
| 2 | **Typosquatting** | Package names suspiciously similar to popular packages | CWE-349 |
| 3 | **Compromised Maintainer** | Weak maintainer trust signals (no 2FA, inactive accounts) | CWE-522 |
| 4 | **Build System Attacks** | CI/CD pipeline poisoning (unpinned actions, secret exposure) | CWE-829 |
| 5 | **Malicious Package** | Suspicious metadata patterns (bot inflation, obfuscation) | CWE-506 |
| 6 | **Install Script Attacks** | Suspicious pre/post install hooks executing code | CWE-829 |
| 7 | **Shadow Dependencies** | Hidden risks in transitive dependency trees | CWE-1357 |
| 8 | **Lockfile Manipulation** | Manifest vs lockfile tampering, phantom dependencies | CWE-345 |
| 9 | **Registry Spoofing** | Misconfigured private/public registry mixing | CWE-346 |
| 10 | **Abandoned Takeover** | Dormant packages vulnerable to maintainer takeover | CWE-1104 |
| 11 | **Source-Binary Mismatch** | Published artifacts diverging from source code | CWE-345 |
| 12 | **Protestware/Wiper** | Conditional destructive behavior patterns | CWE-912 |
| 13 | **Code Inspection** | AST-parsed suspicious calls (eval, exec, subprocess, env access) | CWE-506 |

## 7 Ecosystems

| Ecosystem | Manifest Files | Lockfiles | Registry |
|-----------|---------------|-----------|----------|
| **npm** | package.json | package-lock.json, yarn.lock | npmjs.com |
| **PyPI** | requirements.txt, pyproject.toml, setup.cfg | poetry.lock, Pipfile.lock | pypi.org |
| **Maven** | pom.xml, build.gradle | -- | search.maven.org |
| **Go** | go.mod | go.sum | proxy.golang.org |
| **Cargo** | Cargo.toml | Cargo.lock | crates.io |
| **RubyGems** | Gemfile | Gemfile.lock | rubygems.org |
| **NuGet** | *.csproj, packages.config | -- | nuget.org |

## Quick Start

### Install

```bash
pip install chainreaper
```

Or from source:

```bash
git clone https://github.com/BreachLine/chainreaper.git
cd chainreaper
pip install -e .
```

### Set LLM API Key

```bash
export GEMINI_API_KEY="your-key"        # Google (cheapest ~$0.01/scan)
export ANTHROPIC_API_KEY="your-key"     # Anthropic Claude
export OPENAI_API_KEY="your-key"        # OpenAI GPT
```

### Scan

```bash
# Scan any project (auto-detects ecosystems)
chainreaper scan ./my-project

# Full output: JSON + SARIF + CycloneDX SBOM + HTML attack graph
chainreaper scan ./my-project --format json --format sarif --format cyclonedx --format html

# Choose LLM model + cost limit
chainreaper scan ./project --model gemini/gemini-2.5-flash --cost-limit 1.0

# Filter to specific ecosystems
chainreaper scan ./project --ecosystem npm --ecosystem pypi

# CI mode with policy enforcement
chainreaper scan . --ci --fail-on high --policy policy.yml --quiet

# Incremental scan (only changed deps since last scan)
chainreaper scan . --diff-only --baseline ./previous-results.json

# Scan a GitHub repo directly
chainreaper scan https://github.com/org/repo

# Scan a single package
chainreaper scan express@4.17.1
chainreaper scan requests==2.31.0
```

## How It Works

```
Target -> Discovery -> CVE Scan -> Threat Intel -> Source Check
       -> LLM Plan -> ReAct Analysis -> LLM Verify -> Calibrate
       -> Attack Graph -> Report

Phase 1:  DISCOVERY        Auto-detect ecosystems, parse manifests + lockfiles
Phase 1b: CVE SCAN         Query OSV.dev for every dependency (known vulnerabilities)
Phase 1b: THREAT INTEL     Check OpenSSF malicious packages database (MAL- advisories)
Phase 1c: SOURCE CHECK     Verify provenance (repo link, integrity hash, Sigstore)
Phase 2:  LLM PLANNING     AI plans attack strategy based on what it found so far
Phase 3:  ReAct LOOP       AI autonomously selects + runs 13 analyzers
                           Thought -> Action -> Observation -> Thought -> ...
Phase 4:  LLM VERIFY       AI validates each finding, scores exploitability, filters FPs
Phase 4b: CALIBRATE        Ground LLM confidence against CVE/threat intel data
Phase 5:  ATTACK GRAPH     AI chains findings into multi-step attack paths
Phase 6:  OUTPUT           JSON, SARIF, CycloneDX SBOM, interactive HTML
```

**Zero hardcoded rules.** The LLM makes ALL decisions -- which analyzers to run, in what order, when to stop, how findings chain together, and which are real vs false positives.

### Anti-Hallucination Pipeline

LLMs can hallucinate findings. ChainReaper uses 3 layers of defense:

1. **Phase 0 Ground Truth**: CVE scanner + threat intel feed provide factual data before the LLM runs. These findings have 100% confidence and are never adjusted.
2. **Phase 4 Verification**: A separate LLM call reviews every finding -- assigns exploitability scores (0-10), false positive likelihood, and filters findings above 70% FP probability.
3. **Phase 4b Calibration**: Cross-references LLM claims against ground truth. Findings corroborated by CVE data get boosted. "Malicious package" claims not confirmed by threat intel get -85% confidence penalty and are dropped. No API evidence = -30% penalty.

## Output Formats

| Format | Flag | Use Case |
|--------|------|----------|
| **JSON** | `--format json` | Machine-readable results |
| **SARIF** | `--format sarif` | GitHub Security tab integration |
| **CycloneDX** | `--format cyclonedx` | SBOM for compliance (EO 14028) |
| **HTML** | `--format html` | Interactive D3.js attack graph |

## CI/CD Integration

### GitHub Actions

```yaml
# .github/workflows/security.yml
- uses: BreachLine/chainreaper@main
  with:
    target: "."
    format: "sarif,json,cyclonedx"
    fail-on: "high"
    api-key: ${{ secrets.GEMINI_API_KEY }}
```

SARIF results automatically appear in the **Security** tab. PR comments summarize findings.

### GitLab CI

```yaml
include:
  - remote: 'https://raw.githubusercontent.com/BreachLine/chainreaper/main/ci-templates/gitlab-ci.yml'
```

### Policy Engine

Create `.chainreaper-policy.yml`:

```yaml
fail_on: high
block_rules:
  - name: block-critical-cves
    severity: [critical]
  - name: block-malicious
    attack_vectors: [malicious_package]
allow_rules:
  - name: accept-lodash-cves
    packages: ["lodash"]
    cve_ids: ["CVE-2020-28500"]
warn_rules:
  - name: warn-medium
    severity: [medium]
```

```bash
chainreaper scan . --ci --policy .chainreaper-policy.yml
```

## CLI Reference

```bash
chainreaper scan <target> [OPTIONS]    # Run supply chain security scan
chainreaper list [--all|--analyzers|--ecosystems|--formats]
chainreaper config [--init|--show|--validate|--path|--set KEY=VALUE]
chainreaper version
```

### Scan Options

| Option | Description |
|--------|-------------|
| `--format, -f` | Output: `json`, `sarif`, `cyclonedx`, `html` (repeatable) |
| `--output, -o` | Output directory |
| `--model, -m` | LLM model override |
| `--cost-limit` | Max LLM cost per scan ($) |
| `--ecosystem, -e` | Filter ecosystems (repeatable) |
| `--no-verify` | Skip LLM findings verification |
| `--ci` | CI mode (exit non-zero on findings) |
| `--fail-on` | Min severity to fail: critical/high/medium/low |
| `--policy` | Path to policy YAML file |
| `--baseline, -b` | Previous scan result for incremental diff |
| `--diff-only` | Only scan changed dependencies |
| `--max-iterations` | Max LLM reasoning iterations |
| `--verbose, -v` | Debug logging |
| `--quiet, -q` | Minimal output for CI |

### Supported Targets

| Target Type | Example | How It Works |
|-------------|---------|-------------|
| Local path | `./my-project` | Scans directory for manifests |
| GitHub repo | `https://github.com/org/repo` | Shallow clones + scans |
| npm package | `express@4.21.0` | Fetches deps from npm registry |
| PyPI package | `requests==2.31.0` | Fetches deps from PyPI |
| Registry URL | `https://npmjs.com/package/lodash` | Auto-detects ecosystem |

## Security

ChainReaper is a **read-only** analysis tool with defense-in-depth:

- All registry calls are GET-only with SSRF protection
- Package name validation per ecosystem before any API call
- Domain allowlists for package downloads (registry.npmjs.org, files.pythonhosted.org)
- Rate limiting (10 concurrent) on all registry requests
- LLM cost limits enforced per-scan with pre+post call checks
- Prompt injection defense (XML data tags + input sanitizer)
- Target path validation prevents directory traversal
- Registry tokens excluded from serialization
- HTTP redirects disabled on all clients
- Confidence calibration catches LLM hallucinations

Report vulnerabilities: security@breachline.io

## Development

```bash
pip install -e ".[dev]"
pytest tests/ -v                    # 299 tests
ruff check src/ tests/              # Lint
ruff format src/ tests/             # Format
mypy src/ --ignore-missing-imports  # Type check
```

## Supported LLM Providers

Direct SDK integration (no LiteLLM dependency):

| Provider | Models | Env Variable |
|----------|--------|-------------|
| **Google Gemini** | gemini-2.5-flash, gemini-2.5-pro | `GEMINI_API_KEY` |
| **Anthropic** | claude-sonnet-4, claude-opus-4 | `ANTHROPIC_API_KEY` |
| **OpenAI** | gpt-4o, gpt-4o-mini | `OPENAI_API_KEY` |

Cheapest option: Gemini 2.5 Flash at ~$0.01/scan.

## License

Apache License 2.0 -- see [LICENSE](LICENSE)

---

**Built by [Breachline Labs](https://breachline.io)** -- Autonomous AI Security
