Skip to content

Findings

A Finding is the primary output of a Doctor's examination. It represents a diagnosed issue — a structured object capturing exactly what was found, how severe it is, and the raw evidence that supports the conclusion.

The Finding Object

class Finding(BaseModel):
    title: str
    description: str
    severity: Severity
    confidence: Confidence
    risk_score: float
    risk_level: str
    evidence: Dict[str, Any]
    structured_evidence: List[DiagnosticEvidence]
    tags: List[str]

Fields

Field Type Description
title str A short, human-readable label for the issue (e.g., "Generalization Gap").
description str A detailed explanation of the finding and why it is problematic.
severity Severity The final alert level: INFO, WARNING, or CRITICAL.
confidence Confidence How certain ModelDoctor is that this issue is real: LOW, MEDIUM, HIGH.
risk_score float A continuous risk score from 0.0 to 1.0.
risk_level str The categorical risk level (INFO, LOW, MEDIUM, HIGH, CRITICAL).
evidence dict A flat dictionary of the raw measured values (e.g., {"generalization_gap": 0.23}).
structured_evidence List[DiagnosticEvidence] Structured evidence objects with thresholds and weights.
tags List[str] Categorical tags used for filtering (e.g., ["overfitting", "generalization"]).

Severity Levels

Level Meaning
INFO Healthy or expected behavior. Not shown in primary alert dashboard.
WARNING Degrades performance or risks edge-case failures. Review recommended.
CRITICAL Will likely cause immediate model failure or entirely invalidate evaluation metrics.

DiagnosticEvidence

Each finding is backed by one or more DiagnosticEvidence objects — the raw signals collected by the Doctor before Confidence and Risk scoring:

class DiagnosticEvidence(BaseModel):
    name: str
    measured_value: float
    expected_range: str
    weight: str
    normalized_score: float
Field Description
name Name of the signal (e.g., "Generalization Gap").
measured_value The raw value measured from the model (e.g., 0.23).
expected_range Human-readable threshold (e.g., "<= 0.15").
weight Heuristic importance: Low, Medium, High, Very High.
normalized_score How severely the threshold was violated (0.0 = no violation, 1.0 = maximum violation).

Reading Findings Programmatically

import modeldoctor as md

report = md.diagnose(model, X_train, y_train, X_test, y_test)

# Gather all findings across all doctors
all_findings = [f for d in report.diagnoses for f in (d.findings or [])]

for finding in all_findings:
    if finding.severity.value in ("warning", "critical"):
        print(f"[{finding.severity.value.upper()}] {finding.title}")
        print(f"  {finding.explanation}")
        for ev in finding.structured_evidence:
            print(f"  Evidence: {ev.name} = {ev.measured_value} (expected {ev.expected_range})")

Relationship to Prescriptions

Every WARNING or CRITICAL finding is automatically matched against the Prescription Engine knowledge base. If a match is found, a Prescription is generated and attached to the report with concrete remediation steps.