Metadata-Version: 2.4
Name: encoding-doctor
Version: 0.2.0
Summary: Scan, fix, and verify file encoding issues. Mojibake, BOM, CRLF, null bytes — fixed in one command.
Author-email: Stateflow Labs <stateflow.labs@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://stateflow-dev.github.io/stateflowlabs/
Project-URL: Repository, https://github.com/stateflow-dev/encoding-doctor
Keywords: encoding,utf-8,mojibake,bom,crlf,file encoding,encoding fix,encoding repair,python encoding,encoding tool,developer tools
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# encoding-doctor

**Scan, fix, and verify file encoding issues across your project in one command.**

Fixes mojibake, BOM, CRLF line endings, null bytes, and non-UTF-8 encoding —
automatically detected and repaired, with backups created before every change.

Built from real encoding bugs found in production Python projects on Windows.

---

## Install

```bash
pip install encoding-doctor
```

---

## Usage

```bash
# Step 1 — scan first, always
enc-doctor scan ./my_project

# Step 2 — preview changes without writing
enc-doctor fix ./my_project --dry-run

# Step 3 — fix (backups created automatically as .bak)
enc-doctor fix ./my_project

# Step 4 — verify everything is clean
enc-doctor verify ./my_project
```

---

## What it fixes

| Problem | Description |
|---|---|
| **Mojibake** | UTF-8 bytes mis-read as cp1252 and saved as garbage |
| **BOM** | `\xef\xbb\xbf` prefix added by Notepad/Excel that breaks parsers |
| **CRLF** | Windows `\r\n` mixed with Unix `\n` — causes Git diff noise |
| **Null bytes** | Binary corruption from FTP or terminal copy-paste |
| **Non-UTF-8** | Detected and flagged for manual conversion |

---

## Warning

> **encoding-doctor modifies files in-place.**
>
> - Always run `scan` first and review the report before running `fix`.
> - Backups are created automatically as `.bak` files.
> - Run on a Git-tracked project so you can always revert with `git checkout .`
> - Do not run `fix` on production files without testing first.
> - `verify` after every fix before committing.

---

## Options

```bash
enc-doctor scan   <path> [--all]       # --all shows clean files too
enc-doctor fix    <path> [--dry-run]   # --dry-run previews without writing
enc-doctor verify <path>
enc-doctor restore <file>              # restore single file from .bak
```

---

## Run tests

```bash
pip install pytest
pytest tests/ -v
```

---

## License

MIT © [Stateflow Labs](https://github.com/stateflow-dev)
