Model Comparison Guide
ModelDoctor evaluates one model per diagnose() call. To compare two models, simply call diagnose() twice and compare the resulting reports.
Basic Comparison
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import modeldoctor as md
# Shared dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train two models
lr_model = LogisticRegression(max_iter=500, random_state=42)
lr_model.fit(X_train, y_train)
rf_model = RandomForestClassifier(n_estimators=100, max_depth=None, random_state=42)
rf_model.fit(X_train, y_train)
# Diagnose both
lr_report = md.diagnose(lr_model, X_train, y_train, X_test, y_test)
rf_report = md.diagnose(rf_model, X_train, y_train, X_test, y_test)
Reading Both Reports
def summarize(name, report):
all_findings = [f for d in report.diagnoses for f in (d.findings or [])]
flagged = [f for f in all_findings if f.severity.value in ("warning", "critical")]
print(f"\n{name}")
print(f" Health Score: {report.health_score.overall:.1f}/100 Grade: {report.health_score.grade}")
if flagged:
for f in flagged:
print(f" [{f.severity.value.upper()}] {f.title}")
else:
print(" No issues detected.")
summarize("Logistic Regression", lr_report)
summarize("Random Forest (max_depth=None)", rf_report)
Comparing Health Scores
lr_score = lr_report.health_score.overall
rf_score = rf_report.health_score.overall
print(f"\nLogistic Regression: {lr_score:.1f}/100")
print(f"Random Forest: {rf_score:.1f}/100")
if lr_score > rf_score:
print("Logistic Regression is the healthier model for production.")
else:
print("Random Forest is the healthier model for production.")
Viewing Dashboards Side by Side
Open both dashboards in separate browser tabs for a visual comparison:
What to Look For
When comparing models, pay particular attention to:
| Dimension | Why It Matters |
|---|---|
| Health Score | The single highest-level signal — higher is better. |
| OverfittingDoctor | Unconstrained trees often overfit significantly. |
| CalibrationDoctor | Probability reliability varies widely across model families. |
| ProductionDoctor | Ensemble models can be 10–100x larger than linear models. |
| GeneralizationDoctor | CV stability indicates robustness across different data splits. |