Diagnostic Doctors
The ModelDoctor pipeline is powered by specialized analysis modules called Doctors. Each Doctor is responsible for evaluating a single dimension of model health. Doctors are isolated from one another — they share no direct state — and communicate only through the EvaluationContext.
OverfittingDoctor
Purpose: Detects if the model has memorized the training data and fails to generalize to unseen data.
- Signals collected: Generalization Gap, Memorization, CV Variance, Excessive Capacity, Unrestricted Tree Depth.
- Typical findings:
Generalization Gap(Train accuracy is significantly higher than test accuracy). - Typical recommendations: Apply regularization, reduce tree depth, or collect more training data.
- Supported models: Classification and Regression.
LeakageDoctor
Purpose: Identifies target leakage where future information or identifiers are mistakenly included in the feature set.
- Signals collected: High Correlation, Feature Importance Concentration.
- Typical findings:
Potential Data Leakage Detected(A single feature perfectly predicts the target). - Typical recommendations: Remove the leaky feature or timestamp from the training data.
- Supported models: Classification and Regression.
DataDoctor
Purpose: Assesses the foundational quality of the dataset before the model is even evaluated.
- Signals collected: Missing Values, Class Imbalance, Duplicate Rows, Duplicate Columns.
- Typical findings:
Severe Class Imbalance,High Missing Value Ratio. - Typical recommendations: Apply SMOTE, adjust class weights, or impute missing values.
- Supported models: Classification and Regression.
FeatureDoctor
Purpose: Analyzes feature engineering quality and dimensionality.
- Signals collected: High Dimensionality, Constant Features, Feature Importance Concentration.
- Typical findings:
Constant Features(Features with zero variance). - Typical recommendations: Drop zero-variance features, apply PCA, or perform feature selection.
- Supported models: Classification and Regression.
PredictionDoctor
Purpose: Evaluates raw predictive power and identifies if the model is learning anything useful.
- Signals collected: Test Accuracy, F1 Score, R² Score.
- Typical findings:
Poor Prediction Quality Detected(Model performs no better than random guessing). - Typical recommendations: Ensure data is normalized, tune hyperparameters, or switch to a more complex model architecture.
- Supported models: Classification and Regression.
CalibrationDoctor
Purpose: Determines if a classifier's predicted probabilities actually reflect real-world likelihoods.
- Signals collected: Expected Calibration Error (ECE), Brier Score, Overconfidence.
- Typical findings:
Poor Calibration(Model is highly confident but frequently wrong). - Typical recommendations: Apply Platt Scaling or Isotonic Regression to calibrate probabilities.
- Supported models: Classification only (models must implement
predict_proba).
ProductionDoctor
Purpose: Evaluates whether the model is physically suited for a production deployment environment.
- Signals collected: Model Size, Inference Latency.
- Typical findings:
Large Serialized Model(Model exceeds 100MB),Inference Latency(Predictions take >500ms). - Typical recommendations: Prune trees, switch to a lighter framework, or serve via C++ runtime (ONNX).
- Supported models: Classification and Regression.
GeneralizationDoctor
Purpose: Analyzes cross-validation stability and dataset split integrity.
- Signals collected: CV Variance, Small Validation Set.
- Typical findings:
High CV Variance(Model performance fluctuates wildly across folds). - Typical recommendations: Increase dataset size or use Stratified K-Fold.
- Supported models: Classification and Regression.
Writing a Custom Doctor
You can extend the diagnostic pipeline by writing your own Doctor. See the Plugins & Custom Doctors guide for a complete walkthrough.