Metadata-Version: 2.4
Name: chainreaper
Version: 0.4.0
Summary: Autonomous AI Supply Chain Attack Simulator by Breachline Labs
Project-URL: Homepage, https://github.com/BreachLine/chainreaper
Project-URL: Documentation, https://github.com/BreachLine/chainreaper#readme
Project-URL: Repository, https://github.com/BreachLine/chainreaper
Project-URL: Issues, https://github.com/BreachLine/chainreaper/issues
Project-URL: Changelog, https://github.com/BreachLine/chainreaper/blob/main/CHANGELOG.md
Author-email: Breachline Labs <security@breachline.io>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: attack-simulation,autonomous,cve,cyclonedx,dependency-confusion,devsecops,llm,sarif,sbom,sca,security,supply-chain,typosquatting
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.40.0
Requires-Dist: anyio>=4.4.0
Requires-Dist: google-generativeai>=0.8.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: openai>=1.55.0
Requires-Dist: pydantic-settings>=2.3.0
Requires-Dist: pydantic>=2.7.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: rich>=13.7.0
Requires-Dist: typer>=0.12.0
Provides-Extra: all
Requires-Dist: textual>=0.70.0; extra == 'all'
Requires-Dist: weasyprint>=62.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pre-commit>=3.7.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.5.0; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0.0; extra == 'dev'
Requires-Dist: vcrpy>=6.0.0; extra == 'dev'
Provides-Extra: pdf
Requires-Dist: weasyprint>=62.0; extra == 'pdf'
Provides-Extra: tui
Requires-Dist: textual>=0.70.0; extra == 'tui'
Description-Content-Type: text/markdown

# ChainReaper

**Autonomous AI Supply Chain Attack Simulator** by [Breachline Labs](https://breachline.io)

[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://python.org)
[![Tests](https://img.shields.io/badge/tests-299%20passing-brightgreen.svg)]()
[![Vectors](https://img.shields.io/badge/attack%20vectors-13-red.svg)]()
[![Ecosystems](https://img.shields.io/badge/ecosystems-7-blue.svg)]()

---

The world's first **fully LLM-driven** supply chain security tool. Zero hardcoded rules -- an autonomous AI agent reasons about attack paths, simulates real supply chain attacks across 7 ecosystems, and generates interactive attack graph visualizations.

## Why ChainReaper?

| Feature | Traditional Scanners | ChainReaper |
|---------|---------------------|-------------|
| Analysis | Hardcoded rules + CVE matching | LLM reasons about attacks autonomously |
| Coverage | Known CVEs only | **13 attack vectors** including zero-day patterns |
| Attack Chains | Individual alerts | Chains findings into multi-step exploitation paths |
| Verification | None | LLM validates every finding, scores exploitability |
| Code Inspection | Metadata only | **Downloads and AST-parses** actual package source |
| Confidence | Binary yes/no | Calibrated against CVE data + threat intel |
| Visualization | Text lists | Interactive D3.js attack graph (zoom, click, search) |
| SBOM | No | CycloneDX 1.5 with linked vulnerabilities |
| CI/CD | Basic | GitHub Action, GitLab CI, PR comments, SARIF upload |

## 13 Attack Vectors

| # | Vector | What It Detects | CWE |
|---|--------|----------------|-----|
| 1 | **Dependency Confusion** | Internal package names available on public registries | CWE-427 |
| 2 | **Typosquatting** | Package names suspiciously similar to popular packages | CWE-349 |
| 3 | **Compromised Maintainer** | Weak maintainer trust signals (no 2FA, inactive accounts) | CWE-522 |
| 4 | **Build System Attacks** | CI/CD pipeline poisoning (unpinned actions, secret exposure) | CWE-829 |
| 5 | **Malicious Package** | Suspicious metadata patterns (bot inflation, obfuscation) | CWE-506 |
| 6 | **Install Script Attacks** | Suspicious pre/post install hooks executing code | CWE-829 |
| 7 | **Shadow Dependencies** | Hidden risks in transitive dependency trees | CWE-1357 |
| 8 | **Lockfile Manipulation** | Manifest vs lockfile tampering, phantom dependencies | CWE-345 |
| 9 | **Registry Spoofing** | Misconfigured private/public registry mixing | CWE-346 |
| 10 | **Abandoned Takeover** | Dormant packages vulnerable to maintainer takeover | CWE-1104 |
| 11 | **Source-Binary Mismatch** | Published artifacts diverging from source code | CWE-345 |
| 12 | **Protestware/Wiper** | Conditional destructive behavior patterns | CWE-912 |
| 13 | **Code Inspection** | AST-parsed suspicious calls (eval, exec, subprocess, env access) | CWE-506 |

## 7 Ecosystems

| Ecosystem | Manifest Files | Lockfiles | Registry |
|-----------|---------------|-----------|----------|
| **npm** | package.json | package-lock.json, yarn.lock | npmjs.com |
| **PyPI** | requirements.txt, pyproject.toml, setup.cfg | poetry.lock, Pipfile.lock | pypi.org |
| **Maven** | pom.xml, build.gradle | -- | search.maven.org |
| **Go** | go.mod | go.sum | proxy.golang.org |
| **Cargo** | Cargo.toml | Cargo.lock | crates.io |
| **RubyGems** | Gemfile | Gemfile.lock | rubygems.org |
| **NuGet** | *.csproj, packages.config | -- | nuget.org |

## Quick Start

### Install

```bash
git clone https://github.com/breachline/chainreaper.git
cd chainreaper
pip install -e .
```

### Set LLM API Key

```bash
export GEMINI_API_KEY="your-key"        # Google (cheapest ~$0.01/scan)
export ANTHROPIC_API_KEY="your-key"     # Anthropic Claude
export OPENAI_API_KEY="your-key"        # OpenAI GPT
```

### Scan

```bash
# Scan any project (auto-detects ecosystems)
chainreaper scan ./my-project

# Full output: JSON + SARIF + CycloneDX SBOM + HTML attack graph
chainreaper scan ./my-project --format json --format sarif --format cyclonedx --format html

# Choose LLM model + cost limit
chainreaper scan ./project --model gemini/gemini-2.5-flash --cost-limit 1.0

# Filter to specific ecosystems
chainreaper scan ./project --ecosystem npm --ecosystem pypi

# CI mode with policy enforcement
chainreaper scan . --ci --fail-on high --policy policy.yml --quiet

# Incremental scan (only changed deps since last scan)
chainreaper scan . --diff-only --baseline ./previous-results.json

# Scan a GitHub repo directly
chainreaper scan https://github.com/org/repo

# Scan a single package
chainreaper scan express@4.17.1
chainreaper scan requests==2.31.0
```

## How It Works

```
Target -> Discovery -> CVE Scan -> Threat Intel -> Source Check -> LLM Plan -> ReAct Analysis -> LLM Verify -> Calibrate -> Attack Graph -> Report

Phase 0: CVE SCAN         Query OSV.dev for every dependency (known CVEs + malicious packages)
Phase 0: THREAT INTEL     Check OpenSSF malicious packages database (MAL- advisories)
Phase 0: SOURCE CHECK     Verify provenance (repo link, integrity hash, Sigstore attestation)
Phase 1: DISCOVERY        Auto-detect ecosystems, parse manifests + lockfiles
Phase 2: LLM PLANNING     AI plans attack strategy based on ecosystem types
Phase 3: ReAct LOOP       AI autonomously selects + runs 13 analyzers
                          Thought -> Action -> Observation -> Thought -> ...
Phase 4: LLM VERIFY       AI validates each finding, scores exploitability, filters FPs
Phase 4b: CALIBRATE       Ground LLM confidence against CVE/threat intel data
Phase 5: ATTACK GRAPH     AI chains findings into multi-step attack paths
Phase 6: OUTPUT           JSON, SARIF, CycloneDX SBOM, interactive HTML
```

**Zero hardcoded rules.** The LLM makes ALL decisions -- which analyzers to run, in what order, when to stop, how findings chain together, and which are real vs false positives.

## Output Formats

| Format | Flag | Use Case |
|--------|------|----------|
| **JSON** | `--format json` | Machine-readable results |
| **SARIF** | `--format sarif` | GitHub Security tab integration |
| **CycloneDX** | `--format cyclonedx` | SBOM for compliance (EO 14028) |
| **HTML** | `--format html` | Interactive D3.js attack graph |

## CI/CD Integration

### GitHub Actions

```yaml
# .github/workflows/security.yml
- uses: breachline/chainreaper@main
  with:
    target: "."
    format: "sarif,json,cyclonedx"
    fail-on: "high"
    api-key: ${{ secrets.GEMINI_API_KEY }}
```

SARIF results automatically appear in the **Security** tab. PR comments summarize findings.

### GitLab CI

```yaml
include:
  - remote: 'https://raw.githubusercontent.com/breachline/chainreaper/main/ci-templates/gitlab-ci.yml'
```

### Policy Engine

Create `.chainreaper-policy.yml`:

```yaml
fail_on: high
block_rules:
  - name: block-critical-cves
    severity: [critical]
  - name: block-malicious
    attack_vectors: [malicious_package]
allow_rules:
  - name: accept-lodash-cves
    packages: ["lodash"]
    cve_ids: ["CVE-2020-28500"]
warn_rules:
  - name: warn-medium
    severity: [medium]
```

```bash
chainreaper scan . --ci --policy .chainreaper-policy.yml
```

## CLI Reference

```bash
chainreaper scan <target> [OPTIONS]    # Run supply chain security scan
chainreaper list [--all|--analyzers|--ecosystems|--formats]
chainreaper config [--init|--show|--validate|--path|--set KEY=VALUE]
chainreaper version
```

### Scan Options

| Option | Description |
|--------|-------------|
| `--format, -f` | Output: `json`, `sarif`, `cyclonedx`, `html` (repeatable) |
| `--output, -o` | Output directory |
| `--model, -m` | LLM model override |
| `--cost-limit` | Max LLM cost per scan ($) |
| `--ecosystem, -e` | Filter ecosystems (repeatable) |
| `--no-verify` | Skip LLM findings verification |
| `--ci` | CI mode (exit non-zero on findings) |
| `--fail-on` | Min severity to fail: critical/high/medium/low |
| `--policy` | Path to policy YAML file |
| `--baseline, -b` | Previous scan result for incremental diff |
| `--diff-only` | Only scan changed dependencies |
| `--max-iterations` | Max LLM reasoning iterations |
| `--verbose, -v` | Debug logging |
| `--quiet, -q` | Minimal output for CI |

## Security

ChainReaper is a **read-only** analysis tool with defense-in-depth:

- All registry calls are GET-only with SSRF protection
- Package name validation per ecosystem before any API call
- Domain allowlists for package downloads (npmjs.org, pythonhosted.org)
- Rate limiting (10 concurrent) on all registry requests
- LLM cost limits enforced per-scan with pre+post call checks
- Prompt injection defense (XML data tags + input sanitizer)
- Target path validation prevents directory traversal
- Registry tokens excluded from serialization
- HTTP redirects disabled on all clients
- Confidence calibration catches LLM hallucinations

Report vulnerabilities: security@breachline.io

## Development

```bash
pip install -e ".[dev]"
pytest tests/ -v                    # 299 tests
ruff check src/ tests/              # Lint
ruff format src/ tests/             # Format
mypy src/ --ignore-missing-imports  # Type check
```

## Supported LLM Providers

Google Gemini, Anthropic Claude, OpenAI GPT -- via direct SDKs (no LiteLLM dependency).

## License

Apache License 2.0 -- see [LICENSE](LICENSE)

---

**Built by [Breachline Labs](https://breachline.io)** -- Autonomous AI Security
