Metadata-Version: 2.4
Name: fedstylevalidator
Version: 0.1.0
Summary: Federal writing-style validator for Word docs — CLI, FastAPI, and React UI.
Author-email: Eric Putnam <fsv@putnamgroup.org>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/eputnam77/FedStyleValidator
Project-URL: Repository, https://github.com/eputnam77/FedStyleValidator
Project-URL: Documentation, https://github.com/eputnam77/FedStyleValidator/tree/main/docs
Project-URL: Bug Tracker, https://github.com/eputnam77/FedStyleValidator/issues
Keywords: federal,government,writing,style,validator,document,docx,accessibility,plain-language,fastapi
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Other Audience
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing :: Markup
Classifier: Topic :: Office/Business
Classifier: Typing :: Typed
Requires-Python: <4.0,>=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastapi>=0.109.1
Requires-Dist: starlette>=0.49.1
Requires-Dist: python-docx>=0.8.11
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: pdfkit>=1.0.0
Requires-Dist: nltk>=3.9.3
Requires-Dist: pandas>=2.2.1
Requires-Dist: numpy>=1.24.3
Requires-Dist: pydantic>=2.11.4
Requires-Dist: colorama>=0.4.6
Requires-Dist: typing-extensions>=4.9.0
Requires-Dist: filetype>=1.2.0
Requires-Dist: python-multipart>=0.0.22
Requires-Dist: zipp==3.19.1
Requires-Dist: anyio>=4.4.0
Requires-Dist: httpx>=0.28.1
Requires-Dist: tenacity<9.0,>=8.2
Requires-Dist: pyyaml>=6.0
Requires-Dist: defusedxml>=0.7.1
Requires-Dist: PyJWT>=2.8.0
Provides-Extra: dev
Requires-Dist: ruff>=0.5; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: pytest-cov>=5; extra == "dev"
Requires-Dist: hypothesis>=6; extra == "dev"
Requires-Dist: bandit[toml]>=1.8.5; extra == "dev"
Requires-Dist: semgrep>=1.123.0; extra == "dev"
Requires-Dist: pip-audit>=2.7; extra == "dev"
Requires-Dist: pre-commit>=3.7; extra == "dev"
Requires-Dist: uvicorn>=0.30; extra == "dev"
Provides-Extra: format
Requires-Dist: ruff>=0.5; extra == "format"
Provides-Extra: lint
Requires-Dist: ruff>=0.5; extra == "lint"
Provides-Extra: typing
Requires-Dist: mypy>=1.10; extra == "typing"
Provides-Extra: test
Requires-Dist: pytest>=8; extra == "test"
Requires-Dist: pytest-cov>=5; extra == "test"
Requires-Dist: hypothesis>=6; extra == "test"
Provides-Extra: security
Requires-Dist: bandit[toml]>=1.8.5; extra == "security"
Requires-Dist: semgrep>=1.123.0; extra == "security"
Requires-Dist: pip-audit>=2.7; extra == "security"
Provides-Extra: qa
Requires-Dist: pre-commit>=3.7; extra == "qa"
Dynamic: license-file

# FedStyleValidator

Federal writing‑style validation for Word docs—CLI, FastAPI, and React UI in one toolkit.

**FedStyleValidator** scans Advisory Circulars, Orders, and other Federal documents for headings, formatting, terminology, and accessibility issues. It delivers:

* a **FastAPI** backend (live Swagger UI)
* a **Vite/React** frontend under `frontend/fedstylevalidator` with real-time preview
* an easy **CLI** for local batch checks
* Displays document metadata (Title, Author, Last Modified By, Created, Modified)

The PyPI distribution installs the packaged `fedstylevalidator` modules, the
CLI entry point, and the built-in rule packs. The repository-local `backend/`
and `frontend/` apps remain source-checkout workflows rather than
PyPI-installed components.

Full technical docs live under **`docs/`**; start with *docs/getting‑started.md* when you’re ready to dig deeper.

After cloning, change into the `FedStyleValidator` directory. All commands in this README assume you’re in the repository root—not inside `src`.

## 📄 Supported document formats

DOCX is the gold standard for FedStyleValidator. The CLI/API expect `.docx` input,
while other formats or external tool outputs should flow through adapters that
normalize results into the canonical findings schema for downstream automation.

---

## ✅ Core engine capabilities

FedStyleValidator’s core engine focuses on consistent, repeatable validation you can run locally or in your CI pipeline. It can:

* Parse `.docx` structure and document metadata.
* Validate headings, formatting, accessibility, and terminology rules.
* Normalize document types so CLI, API, and UI agree on naming.
* Produce grouped results by category or severity.
* Power the CLI, FastAPI endpoint, and React UI with the same rules.

Looking for agency-specific rules? Optional packs layer on top of the core
engine without changing your base install.

---

## ✨ Quick install (recommended)

For a one-shot setup run the helper script:

```bash
./scripts/setup_env.sh
```

It creates a local `.venv`, installs locked dependencies, and enables pre-commit hooks.

Prefer to do the steps manually? We follow the same pattern as the other CLI projects: **pipx** for global tool shims and **uv** for ultra‑fast Python/venv work with standard `setuptools` packaging.

```bash
# 0. One-time setup: Python & pipx -------------------------------------------------
python --version                # confirm Python 3.11–3.13
python -m pip install --user pipx
python -m pipx ensurepath       # restart shell if PATH changes
pipx install uv                 # fast resolver, venv mgr, lockfile, tool runner

# 1  Per project ────────────────────────────────────────────────────────
git clone https://github.com/eputnam77/FedStyleValidator.git
cd FedStyleValidator

# 2. Create and activate venv (Python 3.12) --------------------------------------
uv python install 3.12          # Download if not present
uv venv --python 3.12
# Activate the venv:
#   On Windows:
.venv\Scripts\activate
#   On Mac/Linux:
# source .venv/bin/activate

# 3. Install project + extras (dev, test, security) ------------------------------
uv pip install -e ".[dev,test,security]"
# (Optional) Allow prerelease dependencies if needed:
# uv pip install -e ".[dev,test,security]" --prerelease=allow

# 4. (Optional) Upgrade pip and pre-commit inside the venv -----------------------
uv pip install --upgrade pip pre-commit

# 5. Install Git hooks -----------------------------------------------------------
pre-commit install
```

---

## 🐍 Virtual‑env fallback (no uv)

```bash
python -m venv .venv
source .venv/bin/activate           # Win: .venv\Scripts\activate
pip install --upgrade pip
pip install -e ".[dev,test,security]"
```

---

## 📦 Installation options (core vs packs)

All built-in style packs ship in the main distribution — no separate install
or licence key required. Available packs: `core`, `accessibility`,
`plain_language`, `faa`, `dot_ogc`.

```bash
pip install fedstylevalidator
```

Enable a pack via the CLI or API `--pack` argument:

```bash
fedstylevalidator check mydoc.docx --type "Advisory Circular" --pack faa
```

Multiple packs can be combined:

```bash
fedstylevalidator check mydoc.docx --type "Advisory Circular" --pack faa --pack accessibility
```

---

## 🔑 Environment variables (optional)

| Variable                  | Purpose                                 |
| ------------------------- | --------------------------------------- |
| `FEDSTYLE_MODE`           | Runtime profile: `ui` (default, no login) or `secure` (auth on) |
| `FEDSTYLEVALIDATOR_SECRET_KEY` | JWT signing key for the API             |
| `REQUIRE_AUTH`            | Explicit auth override. If unset, mode controls auth (`ui=false`, `secure=true`) |
| `FEDSTYLEVALIDATOR_AUTH_ISSUER` | Optional required `iss` claim for JWT validation |
| `FEDSTYLEVALIDATOR_AUTH_AUDIENCE` | Optional required `aud` claim for JWT validation |
| `FEDSTYLEVALIDATOR_AUTH_CLOCK_SKEW_SECONDS` | Allowed JWT clock skew (default `60`) |
| `FEDSTYLEVALIDATOR_AUTH_REQUIRE_JTI` | Require JWT `jti` claim (default `false`) |
| `FEDSTYLEVALIDATOR_AUTH_REVOKED_JTIS` | Comma-separated revoked JWT IDs (`jti`) |
| `FEDSTYLEVALIDATOR_AUTH_TOKENS_INVALID_BEFORE` | Global token cutoff epoch seconds (`iat` must be newer) |
| `VITE_API_BASE`           | Override API URL for the React frontend |
| `FEDSTYLE_PACK_ENTITLEMENTS` | Deprecated (REL-3). Accepted but ignored — all built-in packs are always available. |
| `ALLOW_ORIGINS`           | Comma-separated list of allowed CORS origins (no wildcard in production) |
| `I_UNDERSTAND_INSECURE_NO_AUTH` | Required to run no-auth mode on non-localhost interfaces |
| `RESULT_PERSISTENCE`      | Result storage mode (`sqlite`, `memory`, `disabled`) |
| `RESULT_SQLITE_PAYLOAD`   | SQLite payload (`full` or `summary`)    |
| `RESULT_TTL`              | Result retention window in seconds      |
| `RESULT_DB_PATH`          | SQLite DB path when persistence is enabled |

Create a `.env` or export vars before running the backend.

---

## 🧾 Result persistence and retention

FedStyleValidator stores API results for a rolling retention window to support
downloads and usage metrics. By default, results are persisted to SQLite for
one hour.

```bash
export RESULT_PERSISTENCE=sqlite
export RESULT_TTL=3600
```

Need to avoid disk writes or store less data? Switch to memory-only storage or
store just the summary fields.

```bash
# Keep results in memory only (no disk writes).
export RESULT_PERSISTENCE=memory

# Or keep SQLite persistence but store only the summary payload.
export RESULT_PERSISTENCE=sqlite
export RESULT_SQLITE_PAYLOAD=summary
```

With `RESULT_PERSISTENCE=disabled`, results are not stored and download/report
endpoints will not find past results.

---

## 🚀 Run the backend API (UI mode, no login)

```bash
python run.py --mode ui --port 8000
# Automatic interactive docs at http://127.0.0.1:8000/docs
```

For secure or remote deployments, use secure mode:

```bash
export FEDSTYLEVALIDATOR_SECRET_KEY="your-long-random-secret"
python run.py --mode secure --host 0.0.0.0 --port 8000
```

### CORS configuration (production)

For public deployments, explicitly set `ALLOW_ORIGINS` to the exact origins that should
reach the API. For example:

```bash
export ALLOW_ORIGINS="https://app.example.com,https://admin.example.com"
```

Local development defaults allow common localhost ports; production should always set
`ALLOW_ORIGINS` to avoid permissive CORS.

---

## 🔐 API authentication

API auth is intended for secure/remote deployments. In default `ui` mode, the
API runs without login. In `secure` mode, every API route expects
`Authorization: Bearer <TOKEN>`.

1. Set the signing key and turn auth on.

```bash
export FEDSTYLEVALIDATOR_SECRET_KEY="your-long-random-secret"
export REQUIRE_AUTH=true
```

2. Generate an HS256 JWT (example using PyJWT).

```bash
python -m pip install pyjwt
python - <<'PY'
import time
import jwt

secret = "your-long-random-secret"
token = jwt.encode(
    {"sub": "local-dev", "exp": int(time.time()) + 3600},
    secret,
    algorithm="HS256",
)
print(token)
PY
```

3. Send the token with requests.

```bash
curl -X POST http://localhost:8000/process \
  -H "Authorization: Bearer <TOKEN>" \
  -F "doc_file=@mydoc.docx" \
  -F "doc_type=Order"
```

Auth is enabled by default in `secure` mode. If you intentionally disable auth
on non-localhost interfaces, set `I_UNDERSTAND_INSECURE_NO_AUTH=true`.

---

## 🖥️ Run the React frontend (Node 18+)

```bash
cd frontend/fedstylevalidator
npm install
npm run dev -- --host 127.0.0.1 --port 3000
```

Open [http://127.0.0.1:3000/](http://127.0.0.1:3000/) and start uploading `.docx` files.

If a previous dev run left stale listeners or large logs, clean them with:

```bash
make dev-down
```

---

## 🛠️ CLI usage

**Run the core engine only**

```bash
fedstylevalidator check mydoc.docx --type "Advisory Circular"
```

**Run the core engine + FAA pack**

```bash
fedstylevalidator check mydoc.docx --type "Advisory Circular" --pack faa
```

or the bare Python entry point:

```bash
python -m fedstylevalidator.cli check mydoc.docx --type "Order"
```

### Suppressions & baselines

Suppress known false positives and capture the reason:

```bash
fedstylevalidator check mydoc.docx --type "Order" --suppressions suppressions.yml
```

Generate a baseline for existing issues, then fail only on new findings:

```bash
# Create the baseline
fedstylevalidator check mydoc.docx --type "Order" --baseline-out baseline.json

# Later runs: only fail on new issues
fedstylevalidator check mydoc.docx --type "Order" --baseline baseline.json --fail-on-new
```

See `docs/suppressions.md` for syntax and examples.

---

## 🧪 Quality Checks & Testing Guide

This project uses a multi-tool testing pipeline to ensure code quality, formatting, type safety, security, and robustness. Below is the full suite of commands and best practices for local development and CI validation.

---

### 1. ✅ Lint, Format, and Static Type Checks

**Defined in `.pre-commit-config.yaml`** and run automatically before every commit (after running `pre-commit install`):

* **Ruff:** Linting and formatting for Python code (also handles import sorting)
* **Black:** Auto-formats Python code to a consistent style
* **Mypy:** Static type checking
* **Bandit:** Python code security scanning (see below for details)
* **mdformat:** Markdown linting and formatting, with Ruff rules
* **Codespell:** Checks for common spelling mistakes in code, comments, and docs

**To run all checks across the codebase:**

```bash
pre-commit install           # (First time only) Installs pre-commit hooks
pre-commit run --all-files   # Run all checks across the codebase
```

> **Tip:** This is the recommended first step before committing or pushing code.

---

### 2. ✅ Unit Tests with Coverage

Run the full test suite with code coverage reporting using pytest:

```bash
pytest --cov=fedstylevalidator
```

* Replace `src` with your module's directory if different.
* Coverage results can be uploaded to Codecov or other CI tools.

---

### 3. 🔡 Spellchecking

Run [Codespell](https://github.com/codespell-project/codespell) to catch common typos in code, comments, and documentation:

```bash
codespell src tests docs
```

> **Note:** Codespell is also included in pre-commit, so this check runs automatically before each commit.

---

### 4. 📚 Docstring Formatting (Optional)

[docformatter](https://github.com/PyCQA/docformatter) ensures all Python docstrings follow [PEP 257](https://peps.python.org/pep-0257/) conventions.

```bash
docformatter -r src/
```

* Recommended for teams/projects that enforce strict docstring style.

---

### 5. 🛡️ Security Scanning

Run security scanners to identify vulnerabilities:

* **Bandit:** Scans Python source code for security issues

  ```bash
  bandit -r src -lll --skip B101
  ```

  * `-r src`: Recursively scans the `src` directory
  * `-lll`: Only high-severity issues
  * `--skip B101`: Skip assert statement warnings

* **pip-audit:** Checks installed dependencies for known security vulnerabilities

  ```bash
  pip-audit
  pip-audit -r requirements.txt
  ```

* **Safety (Optional):** Another dependency vulnerability scanner

  ```bash
  safety check
  ```

  * Not required if using pip-audit, but can be added for redundancy.

---

### 6. 🧬 Mutation Testing (Optional)

[Mutmut](https://mutmut.readthedocs.io/en/latest/) tests your suite’s effectiveness by making small code changes ("mutations") and checking if your tests catch them.

```bash
mutmut run --paths-to-mutate src
mutmut results
```

* Use this occasionally or in CI for robust projects.
* Mutation testing can be time-consuming.

---

### 7. 📦 Suggested Workflow

```bash
pre-commit run --all-files        # Lint, format, type check, spellcheck, markdown, security
pytest --cov=fedstylevalidator         # Unit tests with coverage
bandit -r src -lll --skip B101    # Security scan (code)
pip-audit                         # Security scan (dependencies)
codespell src tests docs          # Spell check (if not running in pre-commit)
docformatter -r src/              # (Optional) Docstring formatting
mutmut run --paths-to-mutate src  # (Optional) Mutation testing
mutmut results
```

---

### 8. 📋 Quick Reference Table

| Tool         | Purpose                     | Command Example                               |
| ------------ | --------------------------- | --------------------------------------------- |
| Ruff         | Lint/format Python code     | `pre-commit run --all-files`                  |
| Black        | Code formatter              | `pre-commit run --all-files`                  |
| Mypy         | Static type checking        | `pre-commit run --all-files`                  |
| Bandit       | Security (code)             | `bandit -r src -lll --skip B101`              |
| pip-audit    | Security (dependencies)     | `pip-audit` / `pip-audit -r requirements.txt` |
| Codespell    | Spell check                 | `codespell src tests docs`                    |
| mdformat     | Markdown formatting/linting | `pre-commit run --all-files`                  |
| docformatter | Docstring style (optional)  | `docformatter -r src/`                        |
| Mutmut       | Mutation test (optional)    | `mutmut run --paths-to-mutate src`            |
| Pytest       | Unit tests/coverage         | `pytest --cov=fedstylevalidator`                            |
| Safety       | Security (deps, optional)   | `safety check`                                |

---

## 📡 Direct API example

**Core engine only**

```bash
curl -F "doc_file=@mydoc.docx" -F "doc_type=Advisory Circular" \
  http://localhost:8000/process
```

**Core engine + FAA pack**

```bash
curl -F "doc_file=@mydoc.docx" -F "doc_type=Advisory Circular" -F "pack=faa" \
  http://localhost:8000/process
```

## POST `/process` endpoint

Uploads a Word document and returns formatting results. The request must use
`multipart/form-data` and accepts these fields:

- `doc_file` – the `.docx` file to check.
- `doc_type` – type of document (e.g. `Advisory Circular`).
- `visibility_json` – optional JSON for per-check visibility.
- `group_by` – optional grouping mode (`category` or `severity`).

Example response:

```json
{
  "has_errors": true,
  "rendered": "<html>...",
  "by_category": {"format": []}
}
```

See [`docs/api-reference.md`](docs/api-reference.md) for the full reference.

---

## 📚 Terminology, document types, and licensing

Need definitions for document types, terminology rules, or pack licensing? Start
with [`docs/concepts.md`](docs/concepts.md) for a quick overview.

### About the requirement files

`requirements.txt` & `requirements‑dev.txt` are generated for legacy tooling. `uv pip sync requirements-dev.txt` (or `pip install -e ".[dev]"`) remains the canonical path to an exact, up‑to‑date environment.

## License

This project is distributed under the [Apache License 2.0](LICENSE).
