Skip to content

Real World Example: Customer Churn

In this tutorial, you'll apply Pilz to a more complex real-world problem: predicting customer churn.

What You'll Learn

  • Working with categorical and numerical features

  • Understanding n_dims and n_cat

  • Interpreting ROC curves

  • Performance tuning basics

Dataset

We'll use the Telco Customer Churn dataset:

  • Task: Predict if a customer will churn (Yes/No)

  • Features: 19 features (demographics, services, billing)

  • Target: Churn (Yes/No)

Step 1: Get the Data

# Download from Kaggle or use sample data

# https://www.kaggle.com/datasets/blastchar/telco-customer-churn

Step 2: Examine the Data

head -3 customer_data.csv
customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn

0002-IDFVH,Male,0,Yes,Yes,2,No,No phone,DSL,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,70.70,No

0003-PIFMY,Female,0,No,No,34,Yes,No,DSL,Yes,No,No,No,No,No,One year,No,Mailed check,90.10,3065.25,No

Step 3: Create DataCard

pilz create-dc --src customer_data.csv --out churn_dc.yaml

Edit the generated file to configure feature types:

# churn_dc.yaml

features:

  - name: gender

    statistical: categorial

    type: string

  - name: SeniorCitizen

    statistical: numerical

    type: int

  - name: Partner

    statistical: categorial

    type: string

  - name: Dependents

    statistical: categorial

    type: string

  - name: tenure

    statistical: numerical

    type: int

  - name: PhoneService

    statistical: categorial

    type: string

  - name: MultipleLines

    statistical: categorial

    type: string

  - name: InternetService

    statistical: categorial

    type: string

  - name: OnlineSecurity

    statistical: categorial

    type: string

  - name: OnlineBackup

    statistical: categorial

    type: string

  - name: DeviceProtection

    statistical: categorial

    type: string

  - name: TechSupport

    statistical: categorial

    type: string

  - name: StreamingTV

    statistical: categorial

    type: string

  - name: StreamingMovies

    statistical: categorial

    type: string

  - name: Contract

    statistical: categorial

    type: string

  - name: PaperlessBilling

    statistical: categorial

    type: string

  - name: PaymentMethod

    statistical: categorial

    type: string

  - name: MonthlyCharges

    statistical: numerical

    type: float

  - name: TotalCharges

    statistical: numerical

    type: float

target:

  feature_name: Churn

  values:

    - "Yes"

    - "No"

train_files:

  - customer_train.csv

test_files:

  - customer_test.csv

Step 4: Training Settings

Create train_settings.yaml:

# Start simple

n: 1

out_folder: churn_model

max_depth: 10

n_dims: 2          # Capture feature interactions

n_cat: 5           # 5 bins per feature

frac_eval_cat: 0.8

max_eval_fit: 50000

min_eval_fit: 100

Understanding the Settings

| Parameter | Value | Why |

|-----------|-------|-----|

| n | 1 | Start with one tree, increase later |

| max_depth | 10 | Enough for 19 features |

| n_dims | 2 | Capture tenure × contract interactions |

| n_cat | 5 | Good balance for mixed feature types |

Step 5: Train the Model

pilz train --datacard churn_dc.yaml --trainsettings train_settings.yaml

Expected output:

INFO: Training for target Yes, tree 0

INFO: Training for target No, tree 0

INFO: Models saved to churn_model/

Step 6: Evaluate

# eval_settings.yaml

in_folders:

  - churn_model

out_folder: churn_eval

out_file: churn_predictions.csv
pilz eval --datacard churn_dc.yaml --evalsettings eval_settings.yaml

Step 7: Interpret Results

ROC Curves

Open churn_eval/Yes_roc.html in your browser.

flowchart LR
    subgraph Data
        F1["FPR: 0.0"]
        F2["FPR: 0.1"]
        F3["FPR: 0.2"]
        F4["FPR: 0.5"]
        F5["FPR: 1.0"]
    end

    subgraph Model
        T1["TPR: 0.3"]
        T2["TPR: 0.7"]
        T3["TPR: 0.9"]
        T4["TPR: 0.98"]
        T5["TPR: 1.0"]
    end

    F1 --> T1
    F2 --> T2
    F3 --> T3
    F4 --> T4
    F5 --> T5

    style F1 fill:#ffcccc
    style T1 fill:#ccffcc```

![Diagram](images/real_world_example_1.svg)

```json
{
  "spores": ["
    {
      \"cut\": [\"Contract = 'Month-to-month'\", \"tenure <= 12\""],
      "score": 0.68,
      "depth": "rr"
    },
    {
      "cut": ["Contract = 'Month-to-month'", "tenure > 12", "InternetService = 'Fiber optic'"],
      "score": 0.55,
      "depth": "rnr"
    },
    {
      "cut": ["Contract = 'Two year'"],
      "score": 0.12,
      "depth": "l"
    }
  ],
  "target": "Yes"
}

What the Model Learned

Rule Score Interpretation
Month-to-month + tenure ≤ 12 68% High churn risk
Month-to-month + fiber optic 55% Medium risk
Two year contract 12% Low churn risk

This makes business sense!

Step 9: Tune for Better Performance

More Trees

# train_settings.yaml - version 2
n: 5              # Increase trees
n_dims: 3         # Try feature triplets
n_cat: 4          # Fewer bins = more general

More Computation

# train_settings.yaml - version 3  
calcs_per_dim: 5000  # Evaluate more combinations
max_depth: 15        # Deeper trees

Summary

You just:

  1. ✅ Prepared a real-world dataset with mixed features
  2. ✅ Created a DataCard with correct types
  3. ✅ Trained with n_dims=2 to capture interactions
  4. ✅ Interpreted learned rules (contract × tenure)
  5. ✅ Evaluated with ROC curves
  6. ✅ Started tuning for better performance

Key Takeaways

  • Categorical features: Group by target rate automatically
  • n_dims=2: Captures interactions like tenure × contract
  • Rules are readable: Convert directly to business insights
  • Tune incrementally: Start simple, add complexity as needed

Next Steps