Skip to content

Diagnose API

The md.diagnose() function is the primary entry point for ModelDoctor. It executes the entire diagnostic pipeline synchronously and returns a populated Report object.

Signature

import modeldoctor as md

report = md.diagnose(
    model,
    X_train,
    y_train,
    X_test,
    y_test,
    X_val=None,
    y_val=None,
    feature_names=None,
    config=None,
    progress_callback=None
)

Inputs

Parameter Type Description
model BaseEstimator A fitted scikit-learn compatible estimator. Must implement fit and predict.
X_train ndarray / DataFrame The training feature matrix.
y_train ndarray / Series The training target array.
X_test ndarray / DataFrame The test (hold-out) feature matrix.
y_test ndarray / Series The test (hold-out) target array.
X_val ndarray / DataFrame (Optional) A secondary validation feature matrix.
y_val ndarray / Series (Optional) A secondary validation target array.
feature_names List[str] (Optional) Explicit feature names. If not provided and inputs are DataFrames, column names are used.
config ModelDoctorConfig (Optional) Custom configuration object to override default thresholds and active Doctors.
progress_callback Callable (Optional) A function fn(message: str) that receives progress updates.

Outputs

The function returns a Report object, which contains: - health_score: A float from 0 to 100. - findings: A list of Finding objects containing diagnostic evidence. - prescriptions: Actionable recommendations generated by the PrescriptionEngine. - model_passport: Metadata about the evaluated model and datasets.

Configuration

You can override default diagnostic thresholds by passing a ModelDoctorConfig object.

from modeldoctor import ModelDoctorConfig

# Relax the overfitting threshold
config = ModelDoctorConfig()
config.rules.overfitting_gap_critical = 0.20

report = md.diagnose(..., config=config)

Common Mistakes

  • Passing unfitted models: The model must already be fitted on X_train and y_train.
  • Mismatching shapes: Ensure that the number of features in X_train exactly matches X_test.
  • Target leakage: If y_train is accidentally included as a column in X_train, the LeakageDoctor will flag it immediately.

Performance Notes

ModelDoctor evaluates metrics lazily. However, large datasets (e.g., >1M rows) may cause slow performance during permutation importance calculations if SHAP is not installed. To speed up diagnostics, consider sampling your datasets before passing them to diagnose().