Real World Example: Customer Churn¶
In this tutorial, you'll apply Pilz to a more complex real-world problem: predicting customer churn.
What You'll Learn¶
-
Working with categorical and numerical features
-
Understanding n_dims and n_cat
-
Interpreting ROC curves
-
Performance tuning basics
Dataset¶
We'll use the Telco Customer Churn dataset:
-
Task: Predict if a customer will churn (Yes/No)
-
Features: 19 features (demographics, services, billing)
-
Target: Churn (Yes/No)
Step 1: Get the Data¶
# Download from Kaggle or use sample data
# https://www.kaggle.com/datasets/blastchar/telco-customer-churn
Step 2: Examine the Data¶
customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0002-IDFVH,Male,0,Yes,Yes,2,No,No phone,DSL,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,70.70,No
0003-PIFMY,Female,0,No,No,34,Yes,No,DSL,Yes,No,No,No,No,No,One year,No,Mailed check,90.10,3065.25,No
Step 3: Create DataCard¶
Edit the generated file to configure feature types:
# churn_dc.yaml
features:
- name: gender
statistical: categorial
type: string
- name: SeniorCitizen
statistical: numerical
type: int
- name: Partner
statistical: categorial
type: string
- name: Dependents
statistical: categorial
type: string
- name: tenure
statistical: numerical
type: int
- name: PhoneService
statistical: categorial
type: string
- name: MultipleLines
statistical: categorial
type: string
- name: InternetService
statistical: categorial
type: string
- name: OnlineSecurity
statistical: categorial
type: string
- name: OnlineBackup
statistical: categorial
type: string
- name: DeviceProtection
statistical: categorial
type: string
- name: TechSupport
statistical: categorial
type: string
- name: StreamingTV
statistical: categorial
type: string
- name: StreamingMovies
statistical: categorial
type: string
- name: Contract
statistical: categorial
type: string
- name: PaperlessBilling
statistical: categorial
type: string
- name: PaymentMethod
statistical: categorial
type: string
- name: MonthlyCharges
statistical: numerical
type: float
- name: TotalCharges
statistical: numerical
type: float
target:
feature_name: Churn
values:
- "Yes"
- "No"
train_files:
- customer_train.csv
test_files:
- customer_test.csv
Step 4: Training Settings¶
Create train_settings.yaml:
# Start simple
n: 1
out_folder: churn_model
max_depth: 10
n_dims: 2 # Capture feature interactions
n_cat: 5 # 5 bins per feature
frac_eval_cat: 0.8
max_eval_fit: 50000
min_eval_fit: 100
Understanding the Settings¶
| Parameter | Value | Why |
|-----------|-------|-----|
| n | 1 | Start with one tree, increase later |
| max_depth | 10 | Enough for 19 features |
| n_dims | 2 | Capture tenure × contract interactions |
| n_cat | 5 | Good balance for mixed feature types |
Step 5: Train the Model¶
Expected output:
INFO: Training for target Yes, tree 0
INFO: Training for target No, tree 0
INFO: Models saved to churn_model/
Step 6: Evaluate¶
# eval_settings.yaml
in_folders:
- churn_model
out_folder: churn_eval
out_file: churn_predictions.csv
Step 7: Interpret Results¶
ROC Curves¶
Open churn_eval/Yes_roc.html in your browser.
flowchart LR
subgraph Data
F1["FPR: 0.0"]
F2["FPR: 0.1"]
F3["FPR: 0.2"]
F4["FPR: 0.5"]
F5["FPR: 1.0"]
end
subgraph Model
T1["TPR: 0.3"]
T2["TPR: 0.7"]
T3["TPR: 0.9"]
T4["TPR: 0.98"]
T5["TPR: 1.0"]
end
F1 --> T1
F2 --> T2
F3 --> T3
F4 --> T4
F5 --> T5
style F1 fill:#ffcccc
style T1 fill:#ccffcc```

```json
{
"spores": ["
{
\"cut\": [\"Contract = 'Month-to-month'\", \"tenure <= 12\""],
"score": 0.68,
"depth": "rr"
},
{
"cut": ["Contract = 'Month-to-month'", "tenure > 12", "InternetService = 'Fiber optic'"],
"score": 0.55,
"depth": "rnr"
},
{
"cut": ["Contract = 'Two year'"],
"score": 0.12,
"depth": "l"
}
],
"target": "Yes"
}
What the Model Learned¶
| Rule | Score | Interpretation |
|---|---|---|
| Month-to-month + tenure ≤ 12 | 68% | High churn risk |
| Month-to-month + fiber optic | 55% | Medium risk |
| Two year contract | 12% | Low churn risk |
This makes business sense!
Step 9: Tune for Better Performance¶
More Trees¶
# train_settings.yaml - version 2
n: 5 # Increase trees
n_dims: 3 # Try feature triplets
n_cat: 4 # Fewer bins = more general
More Computation¶
# train_settings.yaml - version 3
calcs_per_dim: 5000 # Evaluate more combinations
max_depth: 15 # Deeper trees
Summary¶
You just:
- ✅ Prepared a real-world dataset with mixed features
- ✅ Created a DataCard with correct types
- ✅ Trained with n_dims=2 to capture interactions
- ✅ Interpreted learned rules (contract × tenure)
- ✅ Evaluated with ROC curves
- ✅ Started tuning for better performance
Key Takeaways¶
- Categorical features: Group by target rate automatically
- n_dims=2: Captures interactions like tenure × contract
- Rules are readable: Convert directly to business insights
- Tune incrementally: Start simple, add complexity as needed
Next Steps¶
- Multi-Dimensional Splits - Deep dive into n_dims
- Training Internals - Full algorithm
- Best Practices - Tuning guide