Metadata-Version: 2.4
Name: argus-ai-scanner
Version: 1.3.1
Summary: AI-native code security scanner with cascade analysis and Firecracker-microVM DAST runtime validation
Project-URL: Homepage, https://github.com/dshochat/Argus_Scanner
Project-URL: Documentation, https://github.com/dshochat/Argus_Scanner#readme
Project-URL: Repository, https://github.com/dshochat/Argus_Scanner
Project-URL: Issues, https://github.com/dshochat/Argus_Scanner/issues
Project-URL: Changelog, https://github.com/dshochat/Argus_Scanner/releases
Author-email: David Shochat <davidsho1131@gmail.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: ai-security,anthropic,claude,code-scanner,dast,prompt-injection,sarif,security,static-analysis,supply-chain-security,vulnerability-scanner
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: anthropic>=0.40.0
Requires-Dist: google-genai>=0.3.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: pathspec>=0.12.0
Requires-Dist: pydantic>=2.8.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: requests>=2.32.0
Requires-Dist: structlog>=24.4.0
Requires-Dist: tiktoken>=0.7.0
Provides-Extra: dev
Requires-Dist: mypy>=1.11.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest>=8.3.0; extra == 'dev'
Requires-Dist: respx>=0.21.0; extra == 'dev'
Requires-Dist: ruff>=0.6.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5.0; extra == 'docs'
Requires-Dist: mkdocs>=1.6.0; extra == 'docs'
Description-Content-Type: text/markdown

# Argus Scanner

**We don't flag what we can't exploit.**

Argus is an AI-native code security scanner combining a cost-tiered LLM **harness** (Gemini Flash-Lite triage → Sonnet 4.6 → Opus 4.6 escalation) with runtime DAST detonation in a Firecracker microVM and sandbox-verified remediation. Whether the bug is in code your team wrote (SQL injection, auth bypass, deserialization, command injection, crypto misuse) or in code your stack quietly pulled in (a malicious package, a poisoned `CLAUDE.md`, a backdoored `setup.py`, a tampered ML checkpoint loader about to run on someone's machine) — Argus detonates it in the sandbox, captures the exploit firing, generates a patch, replays the same exploit against the patched source, and ships the result as a CI gate.

It targets the gap between *"this looks suspicious"* (pattern-matching SAST) and *"this actually exploits something"* (manual reverse engineering).

**One scanner. Two threat models. Zero false-positive triage.**

Open source. BYOK. Apache 2.0.

You pay your providers directly — Anthropic + Google for the cascade, Fly.io for the optional DAST sandbox. Argus collects nothing.

---

## Quick Start

Get from install to first scan in under 60 seconds:

```bash
pip install argus-ai-scanner
export ANTHROPIC_API_KEY="your-anthropic-key"
export GEMINI_API_KEY="your-gemini-key"

# Single file
argus scan path/to/suspicious.py

# Whole repo (current directory)
argus scan-repo .

# CI mode — only files changed vs main, SARIF for GitHub Code Scanning
argus scan-repo . --diff origin/main --output sarif --output-file findings.sarif

# Pre-install supply-chain gate — scan a PyPI package + its dep closure
# BEFORE pip installs anything. Blocks day-zero malware at the ingestion boundary.
argus install requests
argus install -r requirements.txt --dry-run        # CI gate without installing
argus install litellm --strict-coverage             # extra-paranoid mode
```

Without DAST configured the CLI gracefully degrades to cascade-only verdicts. DAST mode (Firecracker sandbox) requires a Fly.io account — see [docs/dast-setup.md](./docs/dast-setup.md).

## Benchmark Performance

Adversarial regression suite, labeled by a 4-LLM consensus oracle. Methodology, sample size, and per-file breakdown: [`bench_results/v1_1_launch/launch_report.md`](./bench_results/v1_1_launch/launch_report.md).

```
                       Verdict-exact (higher = better)
Argus (cascade + DAST) ████████████████████  91.3%
Gemini 3.1 Pro         █████████████████░░░  82.6%
Grok 4.3               █████████████████░░░  82.6%
Opus 4.6               █████████████████░░░  78.3%
GPT 5.4                ████████████████░░░░  73.9%
```

## How Argus works (the three pillars)

Argus has three pillars. The capability matrix below shows exactly what each pillar does for each file type.

### Pillar 1 — Cascade harness (static + AI analysis)

Every recognized file flows through a cost-tiered model cascade. Deterministic preprocessing first (free, no models): SHA-256, multi-stage deobfuscation (base64 / hex / eval-chain), dependency graphing, attack-vector flagging, AI-file-pattern detection. Files with no outbound intent get dropped before a single token is spent.

Survivors route through a model cascade:

| Cascade stage | Model | Cost / file | Decides |
|---|---|---|---|
| Triage | **Gemini Flash-Lite** | ~$0.001 | `CLEAN` / `LOW` / `HIGH` routing |
| Cheap analysis (LOW tier) | **Gemini Flash** | ~$0.02 | findings on low-priority files |
| Default deep analysis (HIGH tier) | **Anthropic Claude Sonnet 4.6** | ~$0.07 | findings on high-priority files |
| High-stakes / borderline escalation | **Anthropic Claude Opus 4.6** | ~$0.15 | ~20% of HIGH files |

The harness emits structured findings: CWE, line, severity, code, explanation, suggested fix, proof-of-concept, behavioral profile, attack chains, composite risk score. Aggregate cost is ~$4.65 per 100-file scan on a realistic workload mix; hard per-file + per-scan cost caps abort runs that exceed your declared budget.

### Pillar 2 — DAST runtime detonation

When the harness flags suspicion at sufficient verdict tier, the file moves to a Firecracker microVM (`minimal-v1`, `networked-v1`, or `ml_tools-v1` image profile) for two phases:

* **Phase A — exploit testing.** Plan an exploit per harness finding, run it in the sandbox, capture syscalls / egress / filesystem writes, classify each finding as `CONFIRMED` / `BLOCKED` / `UNREACHED` / `NOT_TESTED` based on what actually happened.
* **Phase B — exploit discovery.** Given accumulated evidence, propose NEW hypotheses the harness missed. A deterministic validator gates the proposals; survivors carry forward into the next iteration's Phase A. Up to 3 iterations or until convergence.

This is the layer that kills false positives — a "looks like SQL injection" pattern that the file's own escaping defends against gets `BLOCKED`, not flagged. And it surfaces what static analysis missed — Phase B has actually found new findings the harness didn't catch.

### Pillar 3 — Remediation (fix-and-verify)

When Phase A confirms an exploit on **text source** (Python, JS / TS, shell), Argus generates a patched version, replays the same exploit attempts against the patched code in the same sandbox, and emits per-finding `NEUTRALIZED` / `STILL_EXPLOITABLE` / `UNVERIFIABLE` with sandbox-grounded evidence. You don't get a remediation *suggestion*; you get a remediation that's been *tested*.

**Binary artifact policy.** For ML artifacts (`.pkl` / `.pt` / `.bin` / `.safetensors` / `.h5` / `.onnx`), Argus does NOT auto-patch the binary — the model can't emit valid bytecode-level patches and a corrupt patched pickle would mislead the replay. Instead, the remediation pillar emits structured guidance: regenerate the model from a clean training pipeline and serialize using `safetensors` (which is structurally incapable of carrying executable `__reduce__` payloads). Status is `UNVERIFIABLE` with the guidance in `fix_summary`.

**Opt-out:** pass `--no-remediation` to skip this pillar entirely while keeping the harness + DAST active. Use for compliance scans, CI gates that don't allow source-modification suggestions, read-only audits, or to save ~$0.05/file in patch-generation tokens. The result still includes a structured `phase_c` block with `skipped_reason: "phase_c_disabled_by_config"` so downstream consumers can distinguish "remediation off" from "ran and found nothing to fix."

---

## Coverage matrix

What each pillar does, per file type. ✅ = supported, ⚠️ = supported with policy nuance, ⏳ = roadmap, ❌ N/A = not applicable to this format.

| File type | Harness analysis | DAST exploit testing | DAST exploit discovery | Remediation |
|---|:-:|:-:|:-:|:-:|
| Python (`.py`, `.pyw`, `.pyi`, `.pth`) | ✅ | ✅ | ✅ | ✅ patch + replay |
| JavaScript / TypeScript (`.js`, `.mjs`, `.cjs`, `.jsx`, `.ts`, `.tsx`) | ✅ | ✅ | ✅ | ✅ patch + replay |
| Shell (`.sh`, `.bash`, `.zsh`) | ✅ | ✅ | ✅ | ✅ patch + replay |
| Jupyter notebooks (`.ipynb`) | ✅ cell-by-cell decomposition | ⏳ roadmap | ⏳ roadmap | ⏳ roadmap |
| ML model artifacts (`.pkl`, `.pickle`, `.pt`, `.bin`, `.safetensors`, `.h5`, `.hdf5`, `.keras`, `.onnx`) | ✅ pickletools disassembly | ✅ load-detonation in sandbox | ❌ | ⚠️ guidance only (no auto-patch — see binary policy) |
| GitHub Actions workflows (`.github/workflows/*.yml`) | ✅ deterministic CI-pattern sweep | ⏳ roadmap | ⏳ roadmap | ⏳ roadmap |
| Supply-chain manifests (`package.json`, `requirements.txt`, `Cargo.lock`, `go.mod`, `Gemfile`, `Pipfile`, `setup.py`, `pyproject.toml`, `pom.xml`, `build.gradle`, `*.csproj`, etc.) | ✅ parsed for deps + lifecycle hooks | ❌ N/A (no runtime to detonate) | ❌ N/A | ❌ N/A |
| AI-agent config sentinels (`CLAUDE.md`, `AGENTS.md`, `SKILL.md`, `.cursorrules`, `.clinerules`, `mcp.json`, `plugin.json`, `openapi.{yaml,json}`, `agent-config.{yaml,json,toml}`, etc.) | ✅ prompt-injection surface | ❌ N/A | ❌ N/A | ❌ N/A |
| Other languages tagged for harness (Java, Kotlin, Scala, Go, Rust, Ruby, PHP, C#, C/C++, PowerShell, Lua, Perl, R, Swift, Terraform, HCL) | ✅ generic harness analysis | ⏳ roadmap | ⏳ roadmap | ⏳ roadmap |

## Per-finding verdicts (where the FP kill happens)

Every finding ships with one of these statuses:

| Status | Meaning |
|---|---|
| `CONFIRMED` | Sandbox observed the exploit firing. PoC + event trace surfaced with the finding. |
| `BLOCKED` | Attack was tested; the file's own code defended against it (sanitization, escaping, allowlist). |
| `UNREACHED` | Attack was tested; the code path is genuinely unreachable. |
| `NOT_TESTED` | Sandbox couldn't execute the test. Sub-reason: `infra_stub` / `inconclusive` / `not_planned`. |

A `CONFIRMED` finding looks like this:

```json
{
  "cwe": "CWE-200",
  "type": "data_exfiltration",
  "severity": "critical",
  "status": "CONFIRMED",
  "confidence": 1.0,
  "runtime_evidence": "Mock HTTP server at 127.0.0.1:8000 captured POST body containing 'FAKE_PRIVATE_KEY_CONTENT' and 'ssh-rsa AAAAFAKEKEY user@host'. The malware decoded its base64 payload and POSTed the contents of ~/.ssh/ to the rewritten C2 endpoint.",
  "proof_of_concept": "On any Unix host with SSH keys present, execution sends the full contents of ~/.ssh/ to the remote C2 server over HTTPS."
}
```

DAST cuts three ways: it **confirms** exploits with sandbox-captured evidence, **refutes** false positives with proof of non-exploitability, and **verifies remediations** by replaying the same exploits against the patched source.

## Enterprise Invariants

Anthropic's Claude Security and OpenAI's Codex Security are enterprise-tier and vendor-cloud-only. Argus is the open alternative.

* **BYOK.** You control LLM access; bills go to your API meter, not ours.
* **Zero telemetry.** In cascade-only mode, nothing leaves your machine. In DAST mode, file content is sent only to a Fly.io app *you own and control* — never to Argus-operated infrastructure.
* **Local execution.** Fully self-contained pipeline; no SaaS dependency.

## CLI Reference

### `argus scan <file>` — single-file scan

| Flag | Purpose |
|---|---|
| `--output {json,markdown}` | Output format (default: `json`) |
| `--no-dast` | Skip DAST verification (cascade-only) |
| `--no-remediation` | Skip Phase C (fix-and-verify). Phase A + B still run; no patch is generated. Compliance / CI-gate / read-only-audit use cases. Saves ~$0.05/file. |
| `--max-cost USD` | Abort this file's scan if **per-file** API spend exceeds USD (default: $1.00; pass `0` to disable) |
| `--enable-discovery` | Proactive payload sweep — runs library of attack payloads against the file in sandbox; surfaces runtime-confirmed CWEs as new findings (+~$0.25/file) |
| `--dast-trigger-verdicts LIST` | Comma-separated L1 verdicts that trigger DAST. Default: `malicious,critical_malicious`. Allowed: `clean,suspicious,malicious,critical_malicious` |

### `argus scan-repo <path>` — directory tree scan

| Flag | Purpose |
|---|---|
| `--diff REF` | Only scan files differing vs git ref (e.g., `--diff origin/main` for PR/CI) |
| `--output {markdown,json,sarif}` | Output format (default: `markdown`); `sarif` is SARIF v2.1.0 for GitHub Code Scanning |
| `--output-file PATH` | Write to file instead of stdout |
| `--max-cost USD` | Abort the run when **cumulative** API spend across all files exceeds USD; remaining files are marked `cost_cap_reached`. Pass `0` or omit to disable |
| `--exclude GLOB` | Additional gitignore-style exclude pattern (repeatable) |
| `--no-gitignore` | Ignore `.gitignore` during walk (default: respected) |
| `--max-file-bytes BYTES` | Skip files larger than BYTES (default: 1 MiB) |
| `--no-dast` | Skip DAST verification on every file |
| `--no-remediation` | Skip Phase C on every file. Phase A + B still run; no patches generated. |
| `--enable-discovery` | Proactive payload sweep on every DAST-eligible file |
| `--dast-trigger-verdicts LIST` | Same as `scan` |
| `--continue-on-error` / `--no-continue-on-error` | On per-file exception, record and continue (default) or abort run |

### `argus install <pkg>` — pre-install supply-chain gate

Stages the package via `pip download` (no `setup.py` execution), runs the full Argus pipeline on every wheel/sdist in the dependency closure, then either calls real `pip install` or blocks with the analysis printed. Catches day-zero supply-chain malware at the ingestion boundary — exactly the class advisory-based scanners (`pip-audit`, `safety`) miss.

| Flag | Purpose |
|---|---|
| `<pkg>` | Package spec (e.g. `'requests'`, `'litellm==1.50.0'`, `'fastapi[all]'`). Mutually exclusive with `-r`. |
| `-r PATH` / `--requirement PATH` | Install from a requirements.txt; Argus scans every wheel in the resolved closure. |
| `--block-on LIST` | Comma-separated verdict tiers that block install. Default: `malicious,critical_malicious`. Use `suspicious,malicious,critical_malicious` for stricter gating. |
| `--no-dast` | Cascade-only — skip DAST runtime detonation even if Fly is configured. Faster + cheaper, but leaves runtime-only exploits (load-time RCE in pickles, etc.) un-validated. |
| `--no-cache` | Ignore the wheel-hash verdict cache. Re-scans every artifact from scratch. |
| `--cache-dir PATH` | Override cache directory (default: `~/.cache/argus/install`). |
| `--dry-run` | Run the scan + report verdict; do NOT call `pip install`. For CI gating without side effects. |
| `--strict-coverage` | Escalate verdict to `suspicious` when Argus could only statically analyze <70% of files in a wheel (rest are typically native binaries: `.so`, `.pyd`, `.dll`, `.dylib`, `.exe`). For security-paranoid users / strict CI gates. |
| `--max-cost USD` | Per-file cost cap (default: $1.00). |
| `--max-total-cost USD` | Aggregate cost cap across the whole dependency-closure scan (default: **$10**). When tripped, remaining wheels are flagged `suspicious / unscanned-due-to-cost-cap` and the install fails closed. Pass `0` to disable. |
| `--deep` | Full-fidelity scan — `thinking_budget=24000` on every Sonnet/Opus call, sequential per-file scan, 4 wheels concurrent. ~5–10× more expensive but catches subtle multi-step exploits the default mode might miss. |
| `--no-thinking` | Explicit way to set `thinking_budget=0`. Already the install default; flag exists for script readability. Mutually exclusive with `--deep`. |
| `--parallel N` | Max number of artifacts scanned concurrently (default: **8**). Pass lower if you hit API rate limits. |
| `--pip EXEC` | Pip executable. Default: `pip`. Pass `'uv pip'` for uv-managed envs. |
| `--output {text,json}` | Output format. Default: text. JSON for CI consumption. |

**Phase C is always disabled on the install path.** Remediation for a not-yet-installed package is "don't install", not "patch + replay." If the cascade flags a `malicious` verdict, the install is blocked; the user sees the analysis (CWE, runtime evidence, exfil destination) and decides.

**Wheel-hash caching.** Verdicts are cached at `~/.cache/argus/install/<sha256>.json`. Wheel bytes are immutable on PyPI (re-uploads of the same name+version are rejected), so a verdict is permanently valid for that exact artifact. First-run cost is real; subsequent installs of the same wheel are free.

**Coverage transparency.** A "clean" verdict on a wheel that's 50% native binaries (`.so`, `.pyd`) is honestly weaker evidence than a clean verdict on a wheel that's 100% Python — the report says so. Every artifact verdict reports `n_files_unscanned` + extension histogram. Native binaries are not silently scrubbed from the verdict — coverage warnings surface. `--strict-coverage` opt-in escalates the verdict on low-coverage artifacts.

## Security & Isolation

Argus deliberately detonates potentially malicious code. Host protection is non-negotiable.

* **Hardware-level isolation.** Execution happens inside Firecracker microVMs using KVM hardware virtualization.
* **Ephemeral state.** Every detonation spins up a pristine microVM and is destroyed post-execution. Zero persistence.
* **Strict egress control.** Network profiles enforced at the hypervisor level prevent lateral movement during DAST verification.

## Documentation

| Topic | Page |
|---|---|
| Install guide | [docs/install.md](./docs/install.md) |
| API key sourcing | [docs/api-keys.md](./docs/api-keys.md) |
| Architecture deep dive | [docs/architecture.md](./docs/architecture.md) |
| DAST sandbox setup | [docs/dast-setup.md](./docs/dast-setup.md) |
| Cost guide | [docs/cost-guide.md](./docs/cost-guide.md) |
| Roadmap | [ROADMAP.md](./ROADMAP.md) |
| Contributing | [CONTRIBUTING.md](./CONTRIBUTING.md) |
| Security disclosures | [SECURITY.md](./SECURITY.md) |

## License

[Apache License 2.0](./LICENSE).
