Diagnose API
The md.diagnose() function is the primary entry point for ModelDoctor. It executes the entire diagnostic pipeline synchronously and returns a populated Report object.
Signature
import modeldoctor as md
report = md.diagnose(
model,
X_train,
y_train,
X_test,
y_test,
X_val=None,
y_val=None,
feature_names=None,
config=None,
progress_callback=None
)
Inputs
| Parameter | Type | Description |
|---|---|---|
model |
BaseEstimator |
A fitted scikit-learn compatible estimator. Must implement fit and predict. |
X_train |
ndarray / DataFrame |
The training feature matrix. |
y_train |
ndarray / Series |
The training target array. |
X_test |
ndarray / DataFrame |
The test (hold-out) feature matrix. |
y_test |
ndarray / Series |
The test (hold-out) target array. |
X_val |
ndarray / DataFrame |
(Optional) A secondary validation feature matrix. |
y_val |
ndarray / Series |
(Optional) A secondary validation target array. |
feature_names |
List[str] |
(Optional) Explicit feature names. If not provided and inputs are DataFrames, column names are used. |
config |
ModelDoctorConfig |
(Optional) Custom configuration object to override default thresholds and active Doctors. |
progress_callback |
Callable |
(Optional) A function fn(message: str) that receives progress updates. |
Outputs
The function returns a Report object, which contains:
- health_score: A float from 0 to 100.
- findings: A list of Finding objects containing diagnostic evidence.
- prescriptions: Actionable recommendations generated by the PrescriptionEngine.
- model_passport: Metadata about the evaluated model and datasets.
Configuration
You can override default diagnostic thresholds by passing a ModelDoctorConfig object.
from modeldoctor import ModelDoctorConfig
# Relax the overfitting threshold
config = ModelDoctorConfig()
config.rules.overfitting_gap_critical = 0.20
report = md.diagnose(..., config=config)
Common Mistakes
- Passing unfitted models: The
modelmust already be fitted onX_trainandy_train. - Mismatching shapes: Ensure that the number of features in
X_trainexactly matchesX_test. - Target leakage: If
y_trainis accidentally included as a column inX_train, theLeakageDoctorwill flag it immediately.
Performance Notes
ModelDoctor evaluates metrics lazily. However, large datasets (e.g., >1M rows) may cause slow performance during permutation importance calculations if SHAP is not installed. To speed up diagnostics, consider sampling your datasets before passing them to diagnose().