mldebug

mldebug

A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

Provides tools to run validation checks on reference and current datasets and return reports of detected issues.

@dataclass(frozen=True, slots=True)
class Issue:

Represents a detected issue from a validation or monitoring check.

This is the atomic output of all checks and is intended to be consumed by downstream reporting, alerting, or debugging components.

Parameters
  • name (str): Identifier of the issue type (e.g., "ks_test", "missing_values").
  • metric (str): Name of the metric used to detect the issue (e.g., "distribution_shift_score", "missing_rate_increase").
  • severity (Severity): Importance level of the issue.
  • message (str): Human-readable explanation of the issue.
  • feature (str | None): Feature associated with the issue. None for global issues.
  • value (float | None): Observed metric value that triggered the issue (if applicable).
  • threshold (float | None): Threshold used for comparison. Interpretation depends on the metric.
Issue( name: str, metric: str, severity: Severity, message: str, feature: str | None = None, value: float | None = None, threshold: float | None = None)
name: str
metric: str
severity: Severity
message: str
feature: str | None
value: float | None
threshold: float | None
@dataclass(frozen=True, slots=True)
class Report:

Aggregated output of a full ML debugging run.

Parameters
  • issues (list[Issue]): Collection of detected issues.
Report(issues: list[Issue])
issues: list[Issue]
def summary(self) -> dict[str, typing.Any]:

Summarize issues by severity and total count.

def to_dict(self) -> dict[str, typing.Any]:

Serialize report for logging / APIs.

def score(self) -> dict[str, typing.Any]:

Return a dataset quality score.

The score represents data quality based only on feature-level issues. System-level issues (e.g. schema errors, invalid inputs) are not included in the score but are available in the report.

Returns
  • dict[str, Any]: Dictionary containing:

overall_score : float Dataset quality score in [0, 100]. Higher is better.

feature_scores : dict[str, float] Per-feature scores.

status : str pass / warning / fail.

system_issue_count : int Number of system-level issues.

class Severity(enum.Enum):

Severity level of a detected issue.

INFO: Informational issue with no immediate impact.

WARNING: Potential problem that should be reviewed.

CRITICAL: Serious issue likely to affect model performance or reliability.

INFO = <Severity.INFO: 'info'>
WARNING = <Severity.WARNING: 'warning'>
CRITICAL = <Severity.CRITICAL: 'critical'>
class FeatureType(enum.Enum):

Supported feature types in mldebug.

Defines the canonical feature categories used across schema validation, normalization, and feature-level checks.

NUMERIC: Numeric features validated using numeric-based validation checks.

CATEGORICAL: Categorical features validated using category-based validation checks.

NUMERIC = <FeatureType.NUMERIC: 'numeric'>
CATEGORICAL = <FeatureType.CATEGORICAL: 'categorical'>
def run_checks( reference: Mapping[str, ArrayLike], current: Mapping[str, ArrayLike], schema: Mapping[str, FeatureType]) -> Report:

Run checks on reference and current datasets.

This is the main entrypoint of the library. It performs schema analysis (validation and mismatch detection) followed by feature-level checks based on the provided schema, and returns a structured report of issues.

Parameters
  • reference (Mapping[str, ArrayLike]): Reference dataset keyed by feature name (e.g. training data).
  • current (Mapping[str, ArrayLike]): Current dataset keyed by feature name (e.g. production data).
  • schema (Mapping[str, FeatureType]): Mapping of feature names to their expected types.
Returns
  • Report: Aggregated report containing all detected issues.