Metadata-Version: 2.4
Name: ahh-leek
Version: 0.1.0
Summary: Runtime privacy leak scanner - detects and optionally blocks sensitive data leakage
Project-URL: Homepage, https://github.com/cleonard2341/ahh-leek
Project-URL: Documentation, https://github.com/cleonard2341/ahh-leek#readme
Project-URL: Repository, https://github.com/cleonard2341/ahh-leek
Author: Brody
License-Expression: MIT
License-File: LICENSE
Keywords: leak-detection,pii,privacy,runtime-protection,secrets,security
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: click>=8.0
Provides-Extra: dev
Requires-Dist: aiohttp>=3.8; extra == 'dev'
Requires-Dist: httpx>=0.24; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: requests>=2.28; extra == 'dev'
Description-Content-Type: text/markdown

# ahh-leek

Runtime Privacy Leak Scanner - Detects and optionally blocks sensitive data leakage at runtime.

[![Ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/brody4321)

[![Buy Me A Coffee](https://img.shields.io/badge/Buy%20Me%20A%20Coffee-support-yellow?logo=buymeacoffee)](https://buymeacoffee.com/brody4321)

## Features

- **Secret Detection**: API keys, tokens, credentials, private keys
- **PII Detection**: Credit cards (Luhn validated), SSN, email, phone numbers
- **Exfiltration Detection**: Base64-encoded secrets, env var exposure, large payloads
- **Multiple Modes**: Alert, Block, or Redact sensitive data
- **Runtime Hooks**: Intercepts network requests, logging, and file writes
- **Zero False Positives**: Entropy validation, Luhn algorithm, format validation
- **CLI Tool**: Scan files or wrap scripts with leak detection

## Installation

```bash
pip install ahh-leek
```

Or install from source:

```bash
pip install -e .
```

## Quick Start

### Library Mode

```python
import ahhleek

# Enable with defaults (alert only)
ahhleek.enable()

# Enable with blocking (raises exception on leak)
ahhleek.enable(mode="block")

# Custom configuration
ahhleek.enable(
    detect=["secrets", "pii"],  # Categories to scan
    mode="alert",               # alert | block | redact
    on_leak=my_callback,        # Custom handler
    exclude=[r"/health", r"localhost"],  # URLs to ignore
    confidence_threshold=0.8,   # Min confidence (0.0-1.0)
)

# Disable when done
ahhleek.disable()
```

### One-Shot Scanning

```python
import ahhleek

# Scan a string
result = ahhleek.scan("my api key is sk_live_abc123xyz456")

if result.has_leaks:
    for leak in result:
        print(f"Found: {leak.pattern_name}")
        print(f"  Redacted: {leak.redacted_value}")
        print(f"  Confidence: {leak.confidence:.0%}")
```

### CLI Mode

```bash
# Scan a file
ahh-leek scan config.py

# Scan text directly
ahh-leek scan --text "AKIAIOSFODNN7EXAMPLE"

# JSON output
ahh-leek scan file.py --json

# Run a script with leak detection
ahh-leek run script.py --mode alert

# Run with blocking enabled
ahh-leek run script.py --mode block --detect secrets --detect pii
```

## Detection Patterns

### Secrets (High Confidence)

| Pattern | Example |
|---------|---------|
| AWS Access Key | `AKIAIOSFODNN7EXAMPLE` |
| AWS Secret Key | `wJalrXUtnFEMI/K7MDENG...` |
| GitHub Token | `ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx` |
| Slack Token | `xoxb-1234567890-abcdefghijk` |
| Stripe Key | `sk_live_1234567890abcdefghij` |
| JWT Token | `eyJhbGciOiJIUzI1NiIs...` |
| Private Key | `-----BEGIN RSA PRIVATE KEY-----` |
| Database URI | `postgres://user:pass@host/db` |

### PII (Validated)

| Pattern | Validation |
|---------|------------|
| Credit Card | Luhn algorithm checksum |
| SSN | Format + range validation |
| Email | Format validation |
| Phone | Country code patterns |

### Exfiltration

- Base64-encoded secrets
- Sensitive env vars in outbound data
- Large POST payloads
- Writes to suspicious paths (`/tmp`, `/public`, etc.)

## Operating Modes

### Alert Mode (Default)

Logs warnings but allows data through:

```
[ahh-leek] LEAK DETECTED: aws_access_key
  Location: requests.post() -> api.example.com
  Match: AKIA****************
  Confidence: 95%
  Action: Allowed (alert mode)
```

### Block Mode

Raises `LeakBlockedError` to prevent data from leaving:

```python
ahhleek.enable(mode="block")

try:
    requests.post(url, data={"key": "AKIAIOSFODNN7EXAMPLE"})
except ahhleek.LeakBlockedError as e:
    print(f"Blocked: {e.event.pattern_name}")
```

### Redact Mode

Automatically replaces sensitive data with placeholders:

```python
ahhleek.enable(mode="redact")

# "sk_live_abc123" becomes "[REDACTED:stripe_secret_key]"
```

## Configuration Options

```python
ahhleek.enable(
    # Detection categories
    detect=["secrets", "pii", "exfil"],

    # Operating mode
    mode="alert",  # alert | block | redact

    # Custom leak handler
    on_leak=lambda event: print(f"Leak: {event}"),

    # URL patterns to exclude (regex)
    exclude=[r"localhost", r"/health"],

    # Patterns to allowlist (won't trigger)
    allowlist=[r"test_", r"example\.com"],

    # Minimum confidence threshold
    confidence_threshold=0.7,

    # Hook controls
    hook_network=True,   # Intercept HTTP requests
    hook_logging=True,   # Intercept logging/print
    hook_files=False,    # Intercept file writes (more invasive)
)
```

## Hooks

### Network Hooks

Intercepts outbound network requests:

- `urllib.request.urlopen`
- `requests.Session.request`
- `httpx.Client.request` / `AsyncClient.request`
- `aiohttp.ClientSession._request`
- `socket.socket.send` / `sendall`

### Logging Hooks

Intercepts log output:

- `logging.Handler.emit`
- `builtins.print`
- `sys.stdout` / `sys.stderr`

### File Hooks

Monitors file writes (disabled by default):

- `builtins.open` (write modes)
- `pathlib.Path.write_text` / `write_bytes`

## Zero False Positive Strategy

1. **Entropy Validation**: Generic patterns require high entropy (>3.5 bits/char)
2. **Luhn Algorithm**: Credit cards must pass checksum validation
3. **Format + Range**: SSN must have valid area/group/serial
4. **Allowlists**: User-defined patterns to ignore
5. **Confidence Scoring**: Only alert above threshold

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run specific test file
pytest tests/test_patterns.py -v
```

## Examples

### Custom Callback

```python
def my_leak_handler(event):
    # Send to security logging system
    security_log.warning(
        "Data leak detected",
        extra={
            "pattern": event.pattern_name,
            "location": event.location,
            "confidence": event.confidence,
        }
    )

ahhleek.enable(on_leak=my_leak_handler)
```

### Context Manager

```python
from contextlib import contextmanager

@contextmanager
def leak_protection():
    ahhleek.enable(mode="block")
    try:
        yield
    finally:
        ahhleek.disable()

with leak_protection():
    # All network/logging activity is monitored
    requests.post(url, data=sensitive_data)
```

### Integration with Web Frameworks

```python
# Flask middleware
@app.before_request
def enable_leak_detection():
    if app.config.get("LEAK_DETECTION"):
        ahhleek.enable(mode="alert")

@app.teardown_request
def disable_leak_detection(exception):
    ahhleek.disable()
```

## License

MIT
