Skip to content

Evidence Engine

The EvidenceEngine (and its localized component, EvidenceBuilder) forms the foundation of ModelDoctor's diagnostic pipeline.

Why Evidence Exists

In traditional validation libraries, rules often trigger hard True/False assertions. If a model's accuracy drops below 0.80, it fails.

ModelDoctor uses an evidence-based approach instead. Doctors do not make final decisions; they simply collect Signals (e.g., "The generalization gap is 0.15"). This allows the downstream engines to weigh multiple pieces of evidence together before issuing a final diagnosis.

EvidenceBuilder

Each Doctor instantiates an EvidenceBuilder during the examine() phase. When a threshold is crossed, the Doctor adds a piece of structured evidence to the builder.

if gap > RULES.overfitting_gap_critical:
    builder.add(
        name="Generalization Gap", 
        measured_value=gap, 
        weight="Very High", 
        normalized_score=1.0, 
        expected_range="<= 0.15"
    )

Signal Normalization

The EvidenceBuilder forces Doctors to standardize their signals: - Measured Value: The raw float or int (e.g., 0.15). - Expected Range: A human-readable string explaining the threshold (e.g., <= 0.15). - Normalized Score: A float between 0.0 and 1.0 representing how severely the threshold was violated.

Weighted Evidence

Doctors assign a categorical weight (Low, Medium, High, Very High) to each piece of evidence. This indicates the heuristic importance of the signal. For example, a 50MB model size might be a Medium weight signal for production readiness, whereas a 2000ms inference latency might be Very High.

Structured Evidence

The output of the EvidenceEngine is a collection of DiagnosticEvidence objects. These are eventually packaged into the final Finding objects, ensuring that all UI dashboards and exported JSON reports have a consistent schema, regardless of which Doctor generated the evidence.