Metadata-Version: 2.4
Name: pepsico-document-confidence
Version: 0.1.0
Summary: Proxy-based confidence scoring framework for extraction quality assessment
Project-URL: Homepage, https://github.com/pepsico-ai/document-confidence
Project-URL: Repository, https://github.com/pepsico-ai/document-confidence
Project-URL: Issues, https://github.com/pepsico-ai/document-confidence/issues
Author-email: PepsiCo AI Team <ai@pepsico.com>
License: MIT
Keywords: confidence,document,extraction,quality,scoring
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: plano-core>=0.1.0
Requires-Dist: pydantic>=2.0
Requires-Dist: typing-extensions>=4.0
Provides-Extra: dev
Requires-Dist: black>=23.0; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# document-confidence

A proxy-based confidence scoring framework for evaluating extraction quality without requiring ground truth labels. This library provides a production-ready architecture for assessing document extraction results and routing them to appropriate recovery workflows.

## Overview

`document-confidence` implements a multi-metric scoring system that evaluates extraction quality across multiple dimensions. The library operates deterministically without requiring external APIs, LLM calls, or cloud dependencies, making it suitable for high-throughput document processing pipelines.

## Architecture

The library follows a modular architecture with:

- **Metric-based Scoring** - Independent metrics for different quality dimensions
- **Weighted Aggregation** - Configurable weights for metric prioritization
- **Deficiency Classification** - Automatic detection of specific failure types
- **Recommendation Engine** - Routing decisions for accept/recover/human review
- **Protocol-based Interfaces** - Type-safe contracts for extensibility

## Installation

```bash
pip install document-confidence
```

### Optional Dependencies

```bash
# For development
pip install document-confidence[dev]
```

## Quick Start

```python
from document_confidence import (
    ConfidenceConfig,
    ConfidenceScorer,
    RecommendationType,
)
from document_confidence.metrics import (
    TextCoverageMetric,
    TableCompletenessMetric,
    SchemaFillMetric,
    ConsistencyMetric,
    DensityMetric,
)

# Configure confidence scoring
config = ConfidenceConfig(
    acceptance_threshold=90.0,
    human_review_threshold=80.0,
    weights={
        "text_coverage": 0.25,
        "table_completeness": 0.20,
        "schema_fill": 0.25,
        "consistency": 0.15,
        "density": 0.15,
    },
)

# Create metrics
metrics = [
    TextCoverageMetric(),
    TableCompletenessMetric(),
    SchemaFillMetric(),
    ConsistencyMetric(),
    DensityMetric(),
]

# Initialize scorer
scorer = ConfidenceScorer(config, metrics)

# Score extraction
report = scorer.score(
    extraction=extracted_data,
    page_metadata=page_metadata,
    parse_results=parse_results,
)

# Get recommendation
if report.recommendation == RecommendationType.ACCEPT:
    print("Extraction accepted")
elif report.recommendation == RecommendationType.HUMAN_REVIEW:
    print("Requires human review")
else:
    print("Recovery needed")
```

## Configuration

### Confidence Configuration

```python
from document_confidence import ConfidenceConfig

config = ConfidenceConfig(
    acceptance_threshold=90.0,
    human_review_threshold=80.0,
    weights={
        "text_coverage": 0.25,
        "table_completeness": 0.20,
        "schema_fill": 0.25,
        "consistency": 0.15,
        "density": 0.15,
    },
    enable_metric_explanations=True,
    normalize_weights=True,
    strict_schema_validation=True,
    minimum_metric_score=0.50,
)
```

## Metrics

### Text Coverage Metric

Detects OCR failures by measuring text coverage.

```python
from document_confidence.metrics import TextCoverageMetric

metric = TextCoverageMetric(weight=0.25)
score = metric.compute(extraction, page_metadata, parse_results)
```

**Formula:** `extracted_text_length / expected_text_length`

**Threshold:** < 0.70 → OCR_GAP deficiency

### Table Completeness Metric

Detects missing tables by measuring table completeness.

```python
from document_confidence.metrics import TableCompletenessMetric

metric = TableCompletenessMetric(weight=0.20)
score = metric.compute(extraction, page_metadata, parse_results)
```

**Formula:** `recovered_cells / expected_cells`

**Threshold:** < 0.80 → TABLE_MISSING deficiency

### Schema Fill Metric

Measures extraction completeness by checking schema fill.

```python
from document_confidence.metrics import SchemaFillMetric

metric = SchemaFillMetric(weight=0.25)
score = metric.compute(extraction, page_metadata, parse_results)
```

**Formula:** `required_fields_populated / required_fields_total`

### Consistency Metric

Detects internal contradictions in extraction.

```python
from document_confidence.metrics import ConsistencyMetric

metric = ConsistencyMetric(weight=0.15)
score = metric.compute(extraction, page_metadata, parse_results)
```

**Checks:**
- Duplicate shelf numbers
- Same UPC with different names
- Missing cross-reference mappings
- Orphan products

**Formula:** `1 - error_rate`

**Threshold:** < 0.70 → CROSSREF_BROKEN deficiency

### Density Metric

Detects implausible shelf layouts by measuring density.

```python
from document_confidence.metrics import DensityMetric

metric = DensityMetric(weight=0.15)
score = metric.compute(extraction, page_metadata, parse_results)
```

**Checks:**
- Products per shelf (too sparse or too dense)
- Facings distribution
- Section density

**Threshold:** < 0.60 → SPATIAL_FAILURE deficiency

## Deficiency Classification

The library automatically classifies deficiencies based on metric scores:

```python
from document_confidence.models import DeficiencyType

deficiency_types = [
    DeficiencyType.OCR_GAP,        # Text coverage < 0.70
    DeficiencyType.TABLE_MISSING,   # Table completeness < 0.80
    DeficiencyType.SPATIAL_FAILURE, # Density < 0.60
    DeficiencyType.CROSSREF_BROKEN, # Consistency < 0.70
]
```

Each deficiency includes:
- Type
- Severity (0.0 - 1.0)
- Affected pages
- Human-readable description

## Confidence Bands

Overall scores are mapped to confidence bands:

```python
95-100  → EXCELLENT
90-95   → GOOD
80-90   → FAIR
60-80   → POOR
0-60    → CRITICAL
```

## Recommendations

Based on overall score and deficiencies:

```python
score >= 90              → ACCEPT
80 <= score < 90         → HUMAN_REVIEW
score < 80               → RECOVER
```

## Custom Metrics

Create custom metrics by extending BaseConfidenceMetric:

```python
from document_confidence.metrics import BaseConfidenceMetric

class CustomMetric(BaseConfidenceMetric):
    def __init__(self, weight: float = 0.10):
        super().__init__(name="custom", weight=weight)
    
    def _compute(self, extraction, page_metadata, parse_results):
        # Custom computation logic
        return 0.9  # Return score between 0.0 and 1.0
```

## Development

### Running Tests

```bash
# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=document_confidence
```

### Code Style

```bash
# Format code
black document_confidence

# Lint code
ruff check document_confidence

# Type check
mypy document_confidence
```

## Design Principles

1. **Deterministic** - No external APIs or LLM calls
2. **Computationally Lightweight** - O(n) complexity where possible
3. **Extensible** - Plugin architecture for custom metrics
4. **Type-safe** - Full type hints with Pydantic validation
5. **Production-ready** - Enterprise-scale performance

## Dependencies

- `plano-core>=0.1.0` - Shared interfaces and models
- `pydantic>=2.0` - Data validation
- `typing_extensions>=4.0` - Type extensions

## Performance

The library is designed for:
- 1000+ page documents
- O(n) metric calculations
- Minimal memory usage
- No repeated traversals

## License

MIT

## Support

For issues, questions, or contributions, please visit the project repository.
