Metadata-Version: 2.4
Name: cryptohound
Version: 0.1.0
Summary: Cryptographic discovery scanner: find quantum-vulnerable cryptography in your code and emit a CycloneDX CBOM.
Author: cryptoscan contributors
License: MIT
Project-URL: Homepage, https://github.com/your-org/cryptoscan
Project-URL: Issues, https://github.com/your-org/cryptoscan/issues
Keywords: pqc,post-quantum,cbom,cyclonedx,semgrep,cryptography,sbom
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Security :: Cryptography
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: semgrep>=1.50
Requires-Dist: cyclonedx-python-lib<10,>=8
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: pytest-xdist>=3; extra == "dev"
Dynamic: license-file

# cryptohound

A cryptographic discovery scanner for the post-quantum migration. Point it at a
Python repository and it finds every use of cryptography that a future quantum
computer could break, then emits a **CycloneDX 1.6 CBOM** (cryptographic bill of
materials) and a **human-readable report**.

> **This finds problems. It does not fix them.** cryptohound is the
> "you can't migrate what you can't see" step: discovery and reporting only. It
> does not change your code, manage keys, or perform a migration.

## What it detects

Anything whose security rests on integer factorization or discrete logarithms —
the families Shor's algorithm breaks:

| Family | Detected via |
|--------|--------------|
| RSA | `cryptography`, `pycryptodome`/PyCrypto (incl. `pkcs1_15`/`pss`), `rsa`, `pyOpenSSL`, `paramiko` |
| DSA | `cryptography`, `pycryptodome`, `pyOpenSSL`, `paramiko` |
| ECDSA / EC | `cryptography`, `ecdsa`, `pycryptodome` (`DSS`), `paramiko` |
| EdDSA (Ed25519/Ed448) | `cryptography`, PyNaCl, `paramiko` |
| Diffie-Hellman (DH) | `cryptography` |
| ECDH (incl. X25519/X448) | `cryptography` |
| ElGamal | `pycryptodome` |
| JWT asymmetric algs (RS*/PS*/ES*/EdDSA) | PyJWT, python-jose, Authlib |
| Asymmetric private-key loading | `cryptography` (`load_*_private_key`) |

Detection works in two ways: **source-code patterns** (via a Semgrep rule pack)
and an optional **dependency-manifest scan** (`requirements*.txt`,
`pyproject.toml`, enabled with `--include-deps`) that flags known classical-crypto
libraries as a second, lower-confidence signal.

Rules are **fully qualified**, so Semgrep's import resolution only matches genuine
crypto-library calls — a local variable named `rsa` or a generic `generate_key()`
method is never flagged. On a labeled benchmark cryptohound scores **100% precision
and 100% recall**, and on a corpus of 107 real repositories it produced **zero
false positives**; see [EVALUATION.md](EVALUATION.md).

### What it deliberately does NOT flag

To keep false positives low, quantum-resistant primitives are ignored:
symmetric crypto at adequate sizes (**AES-128/256**) and **SHA-256/384/512**
hashing. JWT **HS\*** (HMAC) algorithms are likewise not flagged.

## Install

cryptohound uses [Semgrep](https://semgrep.dev) as its detection engine, which is
installed as a dependency.

```bash
pip install cryptohound        # from a release
# or, from a clone:
pip install -e .
```

Requires Python 3.9+.

## Usage

```bash
cryptohound path/to/repo
```

This writes `cbom.json` and `report.md` to the current directory and prints a
ranked summary.

```
Found 7 quantum-vulnerable crypto asset(s):

  [    high] ECDH             keys.py:30
             key agreement is exposed to harvest-now-decrypt-later: traffic
             captured today can be decrypted retroactively; found in first-party
             source; 256-bit parameter
  [  medium] RSA              keys.py:9
             public-key encryption protects confidentiality that breaks
             retroactively; found in first-party source; 2048-bit key
  ...
```

### Flags

| Flag | Description |
|------|-------------|
| `-o, --output-dir DIR` | Where to write artifacts (default: current directory). |
| `-f, --format {json,md,both}` | Which artifacts to emit (default: `both`). |
| `--fail-on-severity {info,low,medium,high,critical}` | Exit non-zero if any finding is at or above this level. For CI. |
| `--include-deps` | Also report known crypto libraries listed in dependency manifests. Off by default to keep false positives low. |
| `-q, --quiet` | Suppress the console summary. |
| `--version` | Print version. |

### Use in CI

```yaml
- run: pip install cryptohound
- run: cryptohound . --format json --fail-on-severity high
```

The command exits `1` when a finding meets the threshold, failing the build.

## Output

### `cbom.json`

A valid CycloneDX 1.6 BOM. Each finding is a component of type
`cryptographic-asset` with `cryptoProperties` (primitive, curve, key size) and
cryptohound metadata under `properties`:

```json
{
  "type": "cryptographic-asset",
  "name": "RSA-2048",
  "cryptoProperties": {
    "assetType": "algorithm",
    "algorithmProperties": { "primitive": "pke", "classicalSecurityLevel": 2048 }
  },
  "properties": [
    { "name": "cryptohound:quantum_vulnerable", "value": "true" },
    { "name": "cryptohound:quantum_reason", "value": "RSA relies on integer factorization, broken by Shor's algorithm." },
    { "name": "cryptohound:severity", "value": "medium" },
    { "name": "cryptohound:severity_reason", "value": "..." },
    { "name": "cryptohound:location", "value": "keys.py:9" }
  ]
}
```

### `report.md`

Total findings, a severity-ranked table, and a short per-family "what to do
next" note pointing at the relevant NIST PQC standard (ML-KEM / ML-DSA /
SLH-DSA).

## How severity is decided

True data-sensitivity needs human judgment, so cryptohound does **not** guess it.
Severity is an explainable heuristic built only from signals the tool can
observe, and the reasoning is always emitted alongside the level:

- **Primitive** — key-agreement ranks highest (harvest-now-decrypt-later:
  captured traffic is decryptable retroactively), then signatures, then
  public-key encryption.
- **Key size** — smaller classical keys are already weaker, raising urgency.
- **Detection locus** — first-party source ranks above a dependency-only mention.
- **Third-party touch** — usage flowing through a declared crypto dependency.

Treat the ranking as a starting point for triage, not a verdict.

## Extending the rules

Detection rules live in `src/cryptohound/rules/` as standard Semgrep YAML, one
file per algorithm family. To add coverage, drop in a new rule with this
metadata block and it flows through to the CBOM and report automatically:

```yaml
rules:
  - id: cryptohound-<family>-<context>
    languages: [python]
    severity: WARNING
    message: "..."
    metadata:
      algorithm: <Name>       # e.g. RSA
      primitive: <primitive>  # signature | key-agree | pke | encryption
      quantum_vulnerable: true
      reason: "one-line why"
      library: <lib>
      family: <family>        # e.g. rsa
    patterns:
      - pattern-either:
          - pattern: your.api.call(...)
```

Run `pytest` to confirm the fixtures still pass.

## Scope (v1)

In scope: Python source + manifest scanning, CBOM, report, CI gating.

Out of scope: other languages, hosted dashboards/UI/DB, network or TLS/cert
scanning, HSM inspection, and any automated migration or code-fixing.

## Troubleshooting

**`ModuleNotFoundError: No module named 'pkg_resources'` when scanning.**
Semgrep depends (transitively) on `pkg_resources`, which ships inside
`setuptools`. setuptools 81+ removed it, which breaks Semgrep. Pin an older one
in your environment:

```bash
pip install "setuptools<81"
```

**`pip install -e .` fails with "editable mode currently requires a
setuptools-based build".** Your pip is too old for `pyproject.toml`-only
projects. Upgrade it first: `pip install --upgrade pip setuptools wheel`.

**First scan is slow.** Semgrep initializes its engine on first run (10–30s);
subsequent runs are fast.

## Development

```bash
pip install -e ".[dev]"
pytest -n auto             # ~100 tests, parallel (detection tests invoke Semgrep)
python benchmark/run_benchmark.py   # precision / recall on the labeled benchmark
```

The suite covers per-library detection, explicit false-positive guards (AES,
SHA-2, HMAC, Fernet, locally-named variables), CLI flags and exit codes,
dependency parsing, the severity heuristic, CycloneDX 1.6 validity, and a
benchmark regression gate.

cryptohound is validated two ways — a labeled benchmark and a corpus of 107 real
repositories; see [EVALUATION.md](EVALUATION.md) for the precision/recall and
false-negative analysis.

## License

MIT — see [LICENSE](LICENSE).
