Metadata-Version: 2.4
Name: har-capture
Version: 0.8.2
Summary: HAR capture and PII sanitization library for network traffic analysis
Project-URL: Homepage, https://github.com/solentlabs/har-capture
Project-URL: Documentation, https://github.com/solentlabs/har-capture#readme
Project-URL: Repository, https://github.com/solentlabs/har-capture
Project-URL: Issues, https://github.com/solentlabs/har-capture/issues
Project-URL: Changelog, https://github.com/solentlabs/har-capture/blob/main/CHANGELOG.md
Author: Solent Labs™
License-Expression: MIT
License-File: LICENSE
Keywords: capture,har,http-archive,pii,playwright,sanitization
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Provides-Extra: capture
Requires-Dist: playwright>=1.40; extra == 'capture'
Provides-Extra: cli
Requires-Dist: inquirerpy>=0.3.0; extra == 'cli'
Requires-Dist: rich>=13.0.0; extra == 'cli'
Requires-Dist: typer>=0.12; extra == 'cli'
Provides-Extra: dev
Requires-Dist: hypothesis>=6.100; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pip-audit>=2.7; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest-xdist>=3.5; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.5.0; extra == 'dev'
Requires-Dist: trustme>=1.1; extra == 'dev'
Provides-Extra: full
Requires-Dist: inquirerpy>=0.3.0; extra == 'full'
Requires-Dist: playwright>=1.40; extra == 'full'
Requires-Dist: rich>=13.0.0; extra == 'full'
Requires-Dist: typer>=0.12; extra == 'full'
Description-Content-Type: text/markdown

# har-capture

[![PyPI version](https://img.shields.io/pypi/v/har-capture)](https://pypi.org/project/har-capture/)
[![Downloads](https://img.shields.io/pypi/dm/har-capture)](https://pypi.org/project/har-capture/)
[![codecov](https://codecov.io/gh/solentlabs/har-capture/branch/main/graph/badge.svg)](https://codecov.io/gh/solentlabs/har-capture)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![AI Assisted](https://img.shields.io/badge/AI-Claude%20Assisted-5A67D8.svg)](https://claude.ai)

Capture and sanitize [HAR (HTTP Archive)](https://w3c.github.io/web-performance/specs/HAR/Overview.html) files with deep PII removal. Perfect for support diagnostics, security reviews, and test fixtures.

## Quick Start

<details open>
<summary><b>Windows</b></summary>

1. Install Python from the [Microsoft Store](https://apps.microsoft.com/detail/9NRWMJP3717K) or [python.org](https://www.python.org/downloads/)
1. Open PowerShell and run:

```bash
pip install har-capture[full]
python -m har_capture https://example.com
```

</details>

<details>
<summary><b>macOS / Linux</b></summary>

```bash
pip install har-capture[full]
har-capture https://example.com
```

</details>

<details>
<summary><b>Already have a HAR file?</b></summary>

```bash
pip install har-capture
har-capture sanitize myfile.har
```

</details>

______________________________________________________________________

## Why har-capture?

Chrome DevTools now sanitizes cookies and auth headers, but HAR files contain **much more sensitive data**: IP addresses, MAC addresses, emails, passwords in form bodies, serial numbers, device names, WiFi credentials, session tokens, and API keys.

**How har-capture compares:**

| Feature                               | har-capture | DevTools | Google/Cloudflare |
| ------------------------------------- | ----------- | -------- | ----------------- |
| Deep sanitization (IPs, MACs, emails) | ✅          | ❌       | ❌                |
| Correlation-preserving hashes         | ✅          | ❌       | ❌                |
| Interactive review                    | ✅          | ❌       | Varies            |
| Custom patterns                       | ✅          | ❌       | Limited           |
| Local + CLI automation                | ✅          | No CLI   | Varies            |

**Key benefits:**

- **Zero dependencies** - Core sanitization uses only Python stdlib
- **Format-preserving hashes** - Track the same device across requests without exposing real values
- **One-command workflow** - Capture, sanitize, and compress in a single step

[See detailed comparison with all tools →](docs/COMPARISON.md)

______________________________________________________________________

## See It In Action

**1. Sanitization report** — 84 values auto-redacted across 9 PII categories:

![Sanitization Report](https://raw.githubusercontent.com/solentlabs/har-capture/main/docs/images/sanitization-report.png)

**2. Flagged values for review** — passwords, fields, WiFi SSIDs, and phone numbers detected automatically:

![Flagged Values for Review](https://raw.githubusercontent.com/solentlabs/har-capture/main/docs/images/flagged-values-table.png)

**3. Interactive redaction picker** — high-confidence items pre-selected, you choose the rest:

![Redact Picker](https://raw.githubusercontent.com/solentlabs/har-capture/main/docs/images/redact-picker.png)

______________________________________________________________________

## Installation

```bash
# Core only (sanitization - zero dependencies)
pip install har-capture

# With browser capture support
pip install har-capture[capture]
playwright install chromium

# Full installation (recommended)
pip install har-capture[full]
```

______________________________________________________________________

## Usage

### Command Line

```bash
# Capture and sanitize (interactive review always enabled)
har-capture https://example.com

# Sanitize existing HAR
har-capture sanitize capture.har

# Validate for PII leaks
har-capture validate capture.har
```

[Full CLI reference →](docs/CLI_REFERENCE.md)

### Python API

```python
from har_capture.sanitization import sanitize_html, sanitize_har_file
from har_capture.sanitization.report import HeuristicMode

# Sanitize HTML (correlation-preserving by default)
clean_html = sanitize_html(raw_html)

# Sanitize with consistent salt (correlate across captures)
clean_html = sanitize_html(raw_html, salt="my-secret-key")

# Enable heuristic detection for WiFi, SSIDs, device names
clean_html = sanitize_html(raw_html, heuristics=HeuristicMode.REDACT)

# Sanitize HAR file
sanitize_har_file("capture.har")  # → capture.sanitized.har

# Custom patterns (e.g., modem serials, customer IDs)
custom = {"patterns": {"modem_sn": {"regex": r"SN[0-9]{10}", "replacement_prefix": "MODEM"}}}
sanitize_har_file("capture.har", custom_patterns=custom)

# Redact device-specific credential FIELD NAMES (not just value patterns).
# See docs/CUSTOM_PATTERNS.md#extending-sensitive-field-detection.
device_fields = {"fields": {"auto_redact_patterns": ["pws"]}}
sanitize_har_file("capture.har", custom_patterns=device_fields)
```

______________________________________________________________________

## Documentation

- **[Comparison with Other Tools](docs/COMPARISON.md)** - DevTools, Google, Cloudflare, Edgio
- **[Correlation-Preserving Redaction](docs/CORRELATION.md)** - How format-preserving hashing works
- **[PII Categories](docs/PII_CATEGORIES.md)** - What gets sanitized
- **[Custom Patterns](docs/CUSTOM_PATTERNS.md)** - Add organization-specific patterns
- **[CLI Reference](docs/CLI_REFERENCE.md)** - Detailed command documentation
- **[Interactive Sanitization](docs/INTERACTIVE_SANITIZATION.md)** - Review edge cases manually

______________________________________________________________________

## Use Cases

- **Support diagnostics** - Users submit sanitized HAR files without exposing credentials
- **Security review** - Validate HAR files for PII leaks before sharing
- **Test fixtures** - Generate reproducible traffic captures
- **Modem debugging** - Capture router/modem traffic with sensitive data removed

______________________________________________________________________

## What Gets Sanitized

| Category        | Examples              | Output                                               |
| --------------- | --------------------- | ---------------------------------------------------- |
| **Network**     | IPs, MACs             | `192.168.1.1` → `10.255.42.17`                       |
| **Personal**    | Emails, phones        | `user@example.com` → `user_a1b2@redacted.invalid`    |
| **Credentials** | Passwords, tokens     | `password=secret` → `password=PASS_a1b2c3d4`         |
| **Device**      | Serials, WiFi, SSIDs  | `SN123456` → `SERIAL_a1b2c3d4`                       |
| **HTTP**        | Auth headers, cookies | `Cookie: session=xyz` → `Cookie: session=TOKEN_a1b2` |

[See complete PII categories list →](docs/PII_CATEGORIES.md)

______________________________________________________________________

## Platform Support

| Component    | Windows | macOS | Linux |
| ------------ | ------- | ----- | ----- |
| Sanitization | ✅      | ✅    | ✅    |
| Validation   | ✅      | ✅    | ✅    |
| CLI          | ✅      | ✅    | ✅    |
| Capture      | ✅      | ✅    | ✅    |

______________________________________________________________________

## Contributing

Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

______________________________________________________________________

## License

MIT License - see [LICENSE](LICENSE) for details.
