Metadata-Version: 2.4
Name: entro-scan
Version: 1.2.4
Summary: Entropy-based secret scanner for source code — detects API keys, tokens, passwords, and other sensitive data leaks
Author: entro-scan contributors
License: MIT
Project-URL: Homepage, https://github.com/vyofgod/entro-scan
Project-URL: Repository, https://github.com/vyofgod/entro-scan
Project-URL: Bug Tracker, https://github.com/vyofgod/entro-scan/issues
Keywords: security,secret-scanning,entropy,leak-detection,devsecops
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# entro-scan

**Entropy-based secret scanner** for source code — detects API keys, tokens, passwords, and other sensitive data leaks before they reach production.

[![CI](https://github.com/vyofgod/entro-scan/actions/workflows/ci.yml/badge.svg)](https://github.com/vyofgod/entro-scan/actions/workflows/ci.yml)
[![Python Version](https://img.shields.io/badge/python-3.11%2B-blue)](https://python.org)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

---
## Demo

<p align="center">
  <img src="./demo.gif" alt="entro-scan demo" width="100%">
</p>


## Features

- **AI-triaged scanning** — local qwen2.5 LLM classifies each finding as REAL or NOISE;
  generates pattern-specific remediation advice (rotate, restrict, move to env var, etc.)
- **Shannon entropy analysis** — finds high-entropy strings that look like secrets
- **60+ provider patterns** — AWS (incl. session tokens), GitHub (classic, fine-grained, OAuth, app
  install), GitLab, Slack (bot + webhook), Discord (bot + webhook), Telegram bot, Stripe (live,
  test, publishable, webhook), GCP service-account JSON, Azure Storage / SAS / client secret,
  OpenAI (sk-/sk-proj-/sk-svcacct-), Anthropic (sk-ant-), Supabase service role + JWT, Vercel,
  Clerk (publishable + secret), Linear, Notion, Figma, HuggingFace, PyPI, Shopify, PlanetScale,
  Netlify, Asana, Atlassian, Twilio, SendGrid, Mailgun, Mailchimp, Dropbox, Square, Heroku,
  database connection URLs with embedded credentials, private keys, and more
- **Confidence + risk scoring** — each finding has `severity`, `confidence` (0-100), and a
  composite `risk_score`; gate CI on `--min-confidence`
- **Finding fingerprints** — stable 16-char IDs for dedup, baselining, and SARIF
  `partialFingerprints`
- **Git history / staged / diff scanning** — pre-commit ready, with `--diff-filter=ACMR`
- **Reporters** — terminal, JSON, CSV, SARIF 2.1.0 (with `security-severity`), Markdown
  (PR-ready), and GitHub Actions workflow-command annotations
- **GitHub-native integration** — `--pr-comment` (create-or-update with marker),
  `--github-step-summary`, `--github-annotations`, auto-detected inside Actions
- **Triage workflows** — explain findings by fingerprint, install pre-commit hooks,
  generate `.env.example` placeholders, group monorepo results, and run safe provider verification
- **Dashboard/editor outputs** — single-file HTML dashboard and VS Code problem-matcher friendly output
- **Low false-positive rate** — filters for placeholders, UUIDs, hex blobs, regex source,
  template/format strings, URLs without credentials, and inline `# entro-scan: ignore`
  directives (`ignore`, `ignore-line`, `ignore-next-line`)
- **Smart file walker** — `.gitignore` honored, binary sniffer, configurable
  `max_file_size_mb`, generated reports (`scan.json`, `scan.sarif`, baselines) auto-skipped
- **Redacted baselines** — by default baselines store sha256 + fingerprint only; opt-in raw
  storage with `--no-redact-baseline`
- **Configurable** — TOML config (`.entro-scan.toml` or `[tool.entro-scan]` in
  `pyproject.toml`)
- **Zero dependencies** — pure Python 3.11+ standard library, including the GitHub API client
- **Pre-commit hook ready** — `--git-staged --fail-on-severity critical`
- **CI/CD friendly** — `--fail-on-severity`, exit codes 0/1/2

---

## Installation

### Recommended: pipx

Install `entro-scan` as an isolated global CLI tool.

```bash
pipx install entro-scan
entro-scan --help
```

### Python / pip

```bash
pip install entro-scan
```

### npm / npx

```bash
npx entro-scan
```

> The npm package currently acts as a lightweight wrapper/helper around the Python CLI version.

### Install from source

```bash
git clone https://github.com/vyofgod/entro-scan.git
cd entro-scan
pip install -e .
```

---

## Usage

```bash
# Scan current directory
entro-scan

# Scan a specific path
entro-scan /path/to/project

# Custom entropy threshold (lower = more findings)
entro-scan /path --threshold 4.0

# Output in JSON format
entro-scan /path --format json

# Output to file
entro-scan /path --format json -o results.json

# Scan git history (last 100 commits)
entro-scan /path --git

# Scan git history with custom depth
entro-scan /path --git --max-commits 500

# Scan only staged git files (--staged alias)
entro-scan /path --staged

# Scan only modified git files (--diff alias)
entro-scan /path --diff

# Parallel scan with 8 workers
entro-scan /path --workers 8

# Quiet mode (only findings, no banner)
entro-scan /path --quiet

# Mask secrets in output (default: true, --no-mask-secrets to disable)
entro-scan /path --mask-secrets

# Show unmasked secrets
entro-scan /path --no-mask-secrets

# Fail CI if critical findings are found (--fail-on-findings alias)
entro-scan /path --fail-on-severity critical

# Use a baseline file to ignore known findings
entro-scan /path --baseline .entro-scan.baseline.json

# Save current findings as a new baseline (--update-baseline alias)
entro-scan /path --save-baseline .entro-scan.baseline.json

# Generate a default config file
entro-scan --init

# Markdown PR-style report
entro-scan . --format markdown -o entro-scan-report.md

# Single-file HTML dashboard report
entro-scan . --format html -o entro-scan-report.html

# VS Code / problem matcher friendly output
entro-scan . --format vscode

# GitHub Actions: annotations + PR comment + step summary (all auto-on in Actions)
entro-scan . --github-annotations --github-step-summary --pr-comment

# Filter low-confidence noise
entro-scan . --min-confidence 70

# Disable .gitignore honoring
entro-scan . --no-use-gitignore

# Skip files larger than 1 MB
entro-scan . --max-file-size 1

# Apply scan profiles
entro-scan . --profile ci
entro-scan . --profile paranoid
entro-scan . --profile frontend
entro-scan . --profile backend

# Only report findings not present in a baseline
entro-scan . --only-new --baseline .entro-scan.baseline.json

# Attach git blame / first-seen metadata
entro-scan . --blame --format json

# Group monorepo findings
entro-scan . --group-by package

# Verify supported provider tokens (GitHub, OpenAI, Slack bot tokens)
entro-scan verify .
```

## AI-Triaged Scanning

**New in v1.2.0**: Use the local AI model to automatically classify findings and get pattern-specific recommendations.

```bash
# Setup: check system requirements and download qwen2.5:0.5b (398 MB)
entro-scan ai

# Scan with AI triage (REAL vs NOISE) + pattern-specific advice
entro-scan ai /path/to/project

# Force model re-download or setup
entro-scan ai --setup
```

The AI triage:
- **Detects common false positives** (CSS values, test files, minified JS, placeholders)
- **Classifies findings** as REAL (rotate) or NOISE (ignore)
- **Generates actionable recommendations** per pattern (e.g., "Rotate Google API key in Cloud
  Console, restrict by referrer")
- **Analyzes repo context** to infer project type and credential risk level

Requires **Ollama** + **~1 GB free RAM** (install automatically on first run).

### Inline ignore directives

Add a comment on the same line, or the line above, to suppress a finding:

```python
TOKEN = "ghp_..." # entro-scan: ignore
# entro-scan: ignore-next-line
TOKEN = "ghp_..."
```

Supported comment leaders: `#`, `//`, `/* */`, `--`.

## Workflow Commands

```bash
# Explain why a finding was reported
entro-scan explain abc123def4567890 .

# Include first-seen git metadata and safe provider verification in the explanation
entro-scan explain abc123def4567890 . --blame --verify

# Review findings with remediation and fingerprints
entro-scan fix .

# Accept current findings into a redacted baseline
entro-scan fix . --action baseline --baseline .entro-scan.baseline.json

# Add placeholder keys to .env.example for detected providers
entro-scan fix . --action env-example

# Install a pre-commit hook that scans staged files
entro-scan install-hook
```

---

## Exit Codes

| Code | Meaning |
|------|---------|
| 0    | Success, no findings or findings don't meet severity threshold |
| 1    | Error (invalid config, file not found, etc.) |
| 2    | Blocking findings detected |

---

## Output Formats

### Terminal (default)

Color-coded output with severity levels:

- **Red** (score > 4.5): Critical — likely a secret
- **Yellow** (score > 3.9): High — suspicious
- **Green** (score <= 3.9): Medium — low-confidence finding

### JSON

Machine-readable output for CI/CD pipelines.

### CSV

Spreadsheet-friendly output for reporting.

### SARIF

Static Analysis Results Interchange Format 2.1.0 — compatible with GitHub code scanning. Each
rule includes a `security-severity` property and each result carries a stable
`partialFingerprints` entry so GitHub deduplicates findings across scans.

### Markdown

Pull-request friendly summary with severity badges, a table, and a remediation appendix. Used
by `--pr-comment` and `--github-step-summary`.

### HTML

Single-file dashboard with severity cards, filter buttons, remediation text, and optional
verification / first-seen metadata.

### VS Code

Problem-matcher friendly `file:line:column: severity: message` output for editor and CI log
integrations.

### GitHub Annotations

Emits `::error file=...::` workflow commands so GitHub renders inline annotations on the PR
diff. Auto-enabled when running inside Actions, or via `--github-annotations`.

---

## Configuration

Create `.entro-scan.toml` in your project root:

```toml
threshold = 3.5
workers = 2
quiet = false
verbose = false
output_format = "terminal"
git_enabled = false
git_staged = false
git_diff = false
max_commits = 100
mask_secrets = true
# fail_on_severity = "critical"  # Options: critical, high, medium, any
# baseline_path = ".entro-scan.baseline.json"

exclude_dirs = [
    ".git", "node_modules", "venv", "__pycache__",
    ".idea", ".vscode", "build", "dist", "target",
]

exclude_files = [
    "package-lock.json", "yarn.lock", "pnpm-lock.yaml",
    "cargo.lock", "go.sum", "poetry.lock",
]

include_extensions = [
    ".py", ".rs", ".js", ".ts", ".go", ".java", ".kt", ".swift",
    ".rb", ".php", ".sh", ".json", ".yaml", ".yml", ".toml", ".env",
]

# Allowlist patterns (text containing these will be ignored)
# allowlist_patterns = [
#     "test-secret",
#     "example-key",
# ]

# Allowlist hashes (sha256 of the secret text)
# allowlist_hashes = [
#     "abc123...",
# ]
```

Alternatively, config can live under `[tool.entro-scan]` in your `pyproject.toml`.

---

## Supported Patterns

| Pattern | Severity |
|---------|----------|
| JWT (JSON Web Tokens) | Critical |
| AWS Access Key ID | Critical |
| AWS Secret Key | Critical |
| GitHub Token | Critical |
| Slack Token | Critical |
| Private Keys (RSA/DSA/EC/OpenSSH) | Critical |
| Stripe API Key | Critical |
| Mailchimp API Key | Critical |
| SendGrid API Key | Critical |
| Dropbox API Key | Critical |
| PayPal Braintree Access Token | Critical |
| NPM Token | Critical |
| Docker Hub Token | Critical |
| OpenAI API Key | Critical |
| Anthropic API Key | Critical |
| Supabase Key | Critical |
| Vercel Token | Critical |
| Linear API Key | Critical |
| Facebook Access Token | Critical |
| GitLab Token | High |
| Heroku API Key | High |
| Database URLs (Postgres, MySQL, MongoDB, Redis, SQLite, MariaDB, Oracle) | High |
| Square Access Token | High |
| Square OAuth Secret | High |
| Twitter API Key | High |
| Twitter Access Token | High |
| Google API Key | High |
| Twilio API Key | High |
| Twilio Account SID | High |
| Basic Auth | High |
| Clerk Key | High |
| API Key in URL | Medium |
| Generic API Keys / Secrets | Medium |

---

## Pre-commit Hook

Add to your `.pre-commit-config.yaml`:

```yaml
repos:
  - repo: https://github.com/vyofgod/entro-scan
    rev: v1.0.0
    hooks:
      - id: entro-scan
        args: ["--staged", "--fail-on-severity", "critical"]
```

---

## CI/CD Integration

### GitHub Actions

Create `.github/workflows/entro-scan.yml`:

```yaml
name: Secret Scan

on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          
      - name: Install entro-scan
        run: pip install entro-scan
        
      - name: Run entro-scan
        run: |
          entro-scan . \
            --format sarif \
            -o entro-scan-results.sarif \
            --fail-on-severity critical
            
      - name: Upload SARIF to GitHub
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: entro-scan-results.sarif
```

Or use the official GitHub Action — it wires PR comments, annotations, the step summary, and
SARIF upload in one step:

```yaml
name: Secret Scan
on:
  pull_request:
  push:
    branches: [main]

permissions:
  contents: read
  pull-requests: write   # for --pr-comment
  security-events: write # for SARIF upload to code scanning

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0   # full history for --git
      - name: Run entro-scan
        uses: vyofgod/entro-scan@v1
        with:
          fail-on-severity: high
          format: sarif
          output: entro-scan.sarif
          pr-comment: "true"
          github-annotations: "true"
          step-summary: "true"
          upload-sarif: "true"
```

The action exports `findings_count` and `critical_count` outputs you can branch on in
downstream steps.

---

## Development

```bash
# Install dev dependencies
pip install pytest ruff

# Run tests
pytest tests/ -v

# Lint
ruff check .

# Type check
mypy entro_scan/
```

---

## Why entropy?

Secrets like API keys, tokens, and passwords are typically random strings with high entropy (information density). Natural language text and code identifiers have much lower entropy.

By measuring the Shannon entropy of strings in your codebase, `entro-scan` can flag potential secrets with high accuracy.

---

## License

MIT — see [LICENSE](LICENSE) for details.
