Metadata-Version: 2.4
Name: invoice-analyzer
Version: 0.1.0
Summary: Deterministic migration-risk analysis for OCR-based invoice extraction workloads.
License: MIT
Project-URL: Homepage, https://github.com/bh3r1th/llm-invoice-migration-analyzer
Project-URL: Repository, https://github.com/bh3r1th/llm-invoice-migration-analyzer
Project-URL: Bug Tracker, https://github.com/bh3r1th/llm-invoice-migration-analyzer/issues
Keywords: invoice,ocr,llm,migration,risk-analysis,extraction,validation,cli
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Text Processing
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: price-parser>=0.3.4
Requires-Dist: python-dateutil>=2.9
Requires-Dist: jsonpath-ng>=1.6
Provides-Extra: dev
Requires-Dist: faker>=24.0; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Provides-Extra: fuzzy
Requires-Dist: rapidfuzz>=3.0; extra == "fuzzy"

# Invoice Migration Analyzer

Local CLI for deterministic migration-risk analysis of OCR-based invoice extraction workloads.

## What it does / what it does not do

Answers one question: **"Can this workload safely tolerate a cheaper LLM under operational risk constraints?"**

Not an eval harness, benchmarking suite, model quality scorer, or generic document analyzer. Validation is source-supported only — OCR text is the supporting evidence, never ground truth.

## Classification ladder

A row is labeled by the worst status across `total_amount`, `currency`, and `invoice_date`. `vendor_name` is warning-only.

| Label              | Trigger                                                                 |
|--------------------|-------------------------------------------------------------------------|
| `RISK`             | Any critical field has no source support, or unparseable extraction.    |
| `REVIEW_AMBIGUOUS` | Any critical field has multiple competing source candidates.            |
| `REVIEW_INFERRED`  | Any critical field matches by tolerance/normalization, not literally.   |
| `SAFE`             | Every critical field has direct literal source support and zero ambiguity. |

**SAFE invariant**: both direct source support **and** zero unresolved ambiguity must hold on all critical fields. Any deviation demotes the row. `REVIEW_INFERRED` (tolerance/normalization match) never maps to SAFE — enforced by policy and by a runtime guard.

**Guarantee on the shipped corpus**: false-SAFE count: 0 across 200 rows.

## Installation

```
pip install -e ".[fuzzy]"
python corpus\generate_corpus.py --full
```

## Usage

```
invoice-analyzer run ^
  --sample          PATH    (default: corpus\sample.jsonl) ^
  --keys            PATH    (default: corpus\keys.json) ^
  --output          DIR     (default: .\output) ^
  --cache           DIR     (optional, enables replay cache) ^
  --baseline-cost   FLOAT   (default: 0.015) ^
  --candidate-cost  FLOAT   (default: 0.002) ^
  --volume          INT     (default: 50000) ^
  --max-rows        INT     (default: 1000, hard cap: 1000) ^
  --no-detail ^
  --baseline-model  STR     (default: baseline) ^
  --candidate-model STR     (default: candidate)

invoice-analyzer version
```

## Output files

- `output\report.md` — human-readable report with decision summary, label table, conservative vs optimistic cost projection, and per-row evidence.
- `output\report.html` — same content rendered as a single self-contained HTML page.
- `output\raw_results.jsonl` — one JSON object per row with label, per-field statuses, evidence strings, reasons, cache flag, and any error.

## Cost model

Two scenarios. **Conservative**: only `SAFE` rows migrate to the cheaper model; rest stays baseline. **Optimistic**: also routes `REVIEW_INFERRED` and `REVIEW_AMBIGUOUS` to the cheaper model. Conservative is the planning figure. Optimistic is an upper bound contingent on a human-review pipeline absorbing the review classes.

## Running tests

```
pytest tests\ -v --tb=short --basetemp=.\pytest-tmp
```

The `--basetemp=.\pytest-tmp` workaround is required on Windows due to `AppData\Temp` permission constraints in some environments.

## Corpus

200 adversarial rows. Base 100 (seed 42) + expansion 100 (seed 142). The 8 base attack vectors:

- `multi_occurrence` — same total appears in multiple labeled positions.
- `ocr_near_miss` — total garbled by O/0, l/1, spacing artifacts.
- `inferred_equivalence` — extracted matches by tolerance only (e.g. `300.00` vs `300`).
- `multi_currency` — two currencies present in source.
- `date_collision` — invoice/due/PO dates all parseable and distinct.
- `low_ocr_quality` — heavily corrupted OCR throughout.
- `correct_clean` — well-formed invoice with single supporting evidence.
- `repeated_total` — total repeats but supporting amounts disagree.

Expansion adds 11 further vectors (european amount format, ambiguous date format, symbol-only currency, single-line OCR, vendor edge cases, amount-in-words, amount rounded in source, relative dates, currency implied not stated, amount/date/currency collisions, plus a clean control). Every adversarial example exists to attack SAFE credibility.

## Hard limits / scope

- Invoices only.
- Single-turn JSON extraction.
- Local CLI. No SaaS. No Docker. No UI. No telemetry.
- One baseline vs one candidate model.
- 1000 row hard cap.
