Metadata-Version: 2.4
Name: entro-scan
Version: 1.0.0
Summary: Entropy-based secret scanner for source code — detects API keys, tokens, passwords, and other sensitive data leaks
Author: entro-scan contributors
License: MIT
Project-URL: Homepage, https://github.com/vyofgod/entro-scan
Project-URL: Repository, https://github.com/vyofgod/entro-scan
Project-URL: Bug Tracker, https://github.com/vyofgod/entro-scan/issues
Keywords: security,secret-scanning,entropy,leak-detection,devsecops
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# entro-scan

**Entropy-based secret scanner** for source code — detects API keys, tokens, passwords, and other sensitive data leaks before they reach production.

[![CI](https://github.com/vyofgod/entro-scan/actions/workflows/ci.yml/badge.svg)](https://github.com/vyofgod/entro-scan/actions/workflows/ci.yml)
[![Python Version](https://img.shields.io/badge/python-3.11%2B-blue)](https://python.org)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

## Features

- **Shannon entropy analysis** — finds high-entropy strings that look like secrets
- **Known pattern detection** — regex matching for JWT, AWS keys, GitHub tokens, private keys, DB URLs, and more
- **Git history scanning** — scan commit history for accidentally committed secrets
- **Multiple output formats** — terminal (colorized), JSON, CSV, SARIF (GitHub code scanning compatible)
- **Fast parallel scanning** — multi-process worker pool for large codebases
- **Low false-positive rate** — smart filters reject common strings, hex dumps, and natural language
- **Configurable** — TOML-based config with custom thresholds, excludes, and file types
- **Zero external dependencies** — pure Python 3.11+, uses only the standard library
- **Pre-commit hook ready** — catch secrets before they're committed

## Installation

```bash
pip install entro-scan
```

Or install from source:

```bash
git clone https://github.com/vyofgod/entro-scan.git
cd entro-scan
pip install -e .
```

## Usage

```bash
# Scan current directory
entro-scan

# Scan a specific path
entro-scan /path/to/project

# Custom entropy threshold (lower = more findings)
entro-scan /path --threshold 4.0

# Output in JSON format
entro-scan /path --format json

# Output to file
entro-scan /path --format json -o results.json

# Scan git history (last 100 commits)
entro-scan /path --git

# Scan git history with custom depth
entro-scan /path --git --max-commits 500

# Parallel scan with 8 workers
entro-scan /path --workers 8

# Quiet mode (only findings, no banner)
entro-scan /path --quiet

# Generate a default config file
entro-scan --init
```

## Output Formats

### Terminal (default)
Color-coded output with severity levels:
- **Red** (score > 4.5): Critical — likely a secret
- **Yellow** (score > 3.9): High — suspicious
- **Green** (score <= 3.9): Medium — low-confidence finding

### JSON
Machine-readable output for CI/CD pipelines.

### CSV
Spreadsheet-friendly output for reporting.

### SARIF
Static Analysis Results Interchange Format — compatible with GitHub code scanning.

## Configuration

Create `.entro-scan.toml` in your project root:

```toml
threshold = 3.5
workers = 4
quiet = false
verbose = false
output_format = "terminal"
git_enabled = false
max_commits = 100

exclude_dirs = [
    ".git", "node_modules", "venv", "__pycache__",
    ".idea", ".vscode", "build", "dist", "target",
]

exclude_files = [
    "package-lock.json", "yarn.lock", "pnpm-lock.yaml",
    "cargo.lock", "go.sum",
]

include_extensions = [
    ".py", ".rs", ".js", ".ts", ".go", ".java", ".kt", ".swift",
    ".rb", ".php", ".sh", ".json", ".yaml", ".yml", ".toml", ".env",
]
```

Alternatively, config can live under `[tool.entro-scan]` in your `pyproject.toml`.

## Supported Patterns

| Pattern | Severity |
|---------|----------|
| JWT (JSON Web Tokens) | Critical |
| AWS Access Key ID | Critical |
| AWS Secret Key | Critical |
| GitHub Token | Critical |
| Slack Token | Critical |
| Private Keys (RSA/DSA/EC/OpenSSH) | Critical |
| GitLab Token | High |
| Heroku API Key | High |
| Database URLs (Postgres, MySQL, MongoDB, Redis) | High |
| Generic API Keys / Secrets | Medium |

## Pre-commit Hook

Add to your `.pre-commit-config.yaml`:

```yaml
repos:
  - repo: https://github.com/vyofgod/entro-scan
    rev: v1.0.0
    hooks:
      - id: entro-scan
```

## CI/CD Integration

### GitHub Actions
See [.github/workflows/ci.yml](.github/workflows/ci.yml) for a complete example that runs entro-scan on every push and PR.

### Exit Codes
- `0`: No secrets found (or scan completed successfully)
- `1`: Error (config issue, path not found)

## Development

```bash
# Install dev dependencies
pip install pytest ruff

# Run tests
pytest tests/ -v

# Lint
ruff check .

# Type check
mypy entro_scan/
```

## Why entropy?

Secrets like API keys, tokens, and passwords are typically random strings with high entropy (information density). Natural language text and code identifiers have much lower entropy. By measuring the Shannon entropy of strings in your codebase, entro-scan can flag potential secrets with high accuracy.

## License

MIT — see [LICENSE](LICENSE) for details.
