Skip to content

Multi-Dimensional Splits

The n_dims parameter controls how Pilz finds correlations between features. This is the core innovation that makes Pilz special.

What is n_dims?

n_dims defines how many features are combined in a single split:

flowchart TB
    subgraph n_dims_1
        D1["n_dims=1"] --> F1[X alone]
    end

    subgraph n_dims_2
        D2["n_dims=2"] --> P1[X AND Ypairs]
    end

    subgraph n_dims_3
        D3["n_dims=3"] --> T1[X AND Y AND Ztriplets]
    end```

![Diagram](images/multi_dimensional_splits_1.svg)

| n_dims | Combinations Evaluated |

|--------|------------------------|

| 1 | Single features only |

| 2 | Feature pairs |

| 3 | Feature triplets |

## Why Multi-Dimensional Matters

### The Correlation Problem

When features correlate, their combination is more predictive than either alone:

```mermaid
flowchart LR
    A[Start] --> B[Process]
    B --> C[End]

    style A fill:#e0f0ff
    style C fill:#ccffcc```

![Diagram](images/multi_dimensional_splits_2.svg)

mermaid

flowchart LR

    subgraph "Individual Features"

        I1["Contract=Monthlychurn=42%"] --> R1[Not enough]

        I2["TechSupport=Nochurn=38%"] --> R2[Not enough]

    end



    subgraph "Combined (Correlation)"

        C["Contract=MonthlyTechSupport=Nochurn=85%"] --> R3[Strong signal]

    end



    I1 --> C

    I2 --> C



    style C fill:#ccffcc

    style R3 fill:#ccffcc```

T3 -->|No| T5[Medium Churn]

        T2 -->|No| T6[Low Churn]

    end



    subgraph "Pilz - Single Multi-Dimensional Cut"

        P1[Data] --> P2{Contract=MonthlyAND TechSupport=No?}

        P2 -->|Yes| P3[High ChurnScore: 0.85]

        P2 -->|No| P4[Low ChurnScore: 0.15]

    end



```mermaid
flowchart TD
    subgraph "Traditional Tree - Needs Multiple Splits"
        T1[Data] --> T2{Contract=Monthly?}
        T2 -->|Yes| T3{No TechSupport?}
        T3 -->|Yes| T4[High Churn]
        T3 -->|No| T5[Medium Churn]
        T2 -->|No| T6[Low Churn]
    end

    subgraph "Pilz - Single Multi-Dimensional Cut"
        P1[Data] --> P2{Contract=MonthlyAND TechSupport=No?}
        P2 -->|Yes| P3[High ChurnScore: 0.85]
        P2 -->|No| P4[Low ChurnScore: 0.15]
    end

    style P2 fill:#ccffcc```

![Diagram](images/multi_dimensional_splits_3.svg)

=85%"]

    end```

### Step 2: Score Each Combination

Score = how well the combination separates target from non-target:

Score = |target_rate_in_bin - target_rate_in_other_bins|

```

For the n_dims=2 table above:

  • X=0, Y=0: 15% (low - likely non-target)

  • X=0, Y=1: 45% (medium - neutral)

  • X=1, Y=0: 55% (medium - neutral)

  • X=1, Y=1: 85% (high - likely target)

Step 3: Determine Split

```mermaid flowchart LR A[Start] --> B[Process] B --> C[End]

style A fill:#e0f0ff
style C fill:#ccffcc```

Diagram

mermaid

flowchart TB

subgraph "n_dims=1 (X only)"

    T1["X=0: T=30%, NT=70%"]

    T2["X=1: T=70%, NT=30%"]

end



subgraph "n_dims=2 (X and Y)"

    T3["X=0, Y=0: T=15%"]

    T4["X=0, Y=1: T=45%"]

    T5["X=1, Y=0: T=55%"]

    T6["X=1, Y=1: T=85%"]

end```

nentially more combinations:

```mermaid flowchart TB subgraph "Combination Explosion" A[4 features] --> B["n_dims=1: 4"] A --> C["n_dims=2: 6"] A --> D["n_dims=3: 4"] A --> E["n_dims=4: 1"] end

A --> F[20 features]
F --> G["n_dims=1: 20"]
F --> H["n_dims=2: 190"]
F --> I["n_dims=3: 1140"]
F --> J["n_dims=4: 4845"]```

Diagram

| Features | n_dims=1 | n_dims=2 | n_dims=3 | n_dims=4 |

|----------|----------|----------|----------|----------|

| 4 | 4 | 6 | 4 | 1 |

| 10 | 10 | 45 | 120 | 210 |

| 20 | 20 | 190 | 1,140 | 4,845 |

| 50 | 50 | 1,225 | 19,600 | 230,300 |

Co

```mermaid flowchart TB S[Score each bin] --> C[Compare] C -->|"Rate > threshold"| H[Right: High rate] C -->|"Rate < threshold"| L[Left: Low rate] C -->|"Rate ≈ 0.5"| N[Neutral: Uncertain]

style H fill:#ccffcc
style L fill:#ffcccc
style N fill:#ffff99```

Diagram

y]

style L fill:#e0f0ff

style S fill:#ffff99```

Practical Guidelines

When to Use Higher n_dims

| n_dims | Best For |

|--------|----------|

| 1 | Simple datasets, many features, baseline |

| 2 | Most cases - captures pairwise correlations |

| 3 | Complex interactions, fewer features |

Recommendations

``

```mermaid flowchart TB subgraph "Combination Explosion" A[4 features] --> B["n_dims=1: 4"] A --> C["n_dims=2: 6"] A --> D["n_dims=3: 4"] A --> E["n_dims=4: 1"] end

A --> F[20 features]
F --> G["n_dims=1: 20"]
F --> H["n_dims=2: 190"]
F --> I["n_dims=3: 1140"]
F --> J["n_dims=4: 4845"]```

Diagram

style A3 fill:#ffff99

style A4 fill:#ffff99```

FUNCTION FIND_BEST_SPLIT(train_df, settings):

scored = []



# Step 1: Score individual features (n_dims=1)

FOR feature IN train_df.features:

    table = BUILD_CORRELATION_TABLE([feature], train_df)

    score = CALCULATE_DISCRIMINATION(table)

    scored.append((feature, score))



scored = SORT_BY(scored, DESC)

best = scored[0]



# Step 2: Try combinations for higher n_dims

IF settings.n_dims >= 2:

    FOR dim IN 2..settings.n_dims:

        counter = 0



        FOR combo IN COMBINATIONS(scored, dim):

            counter += 1

```mermaid flowchart LR C[Start] --> L{count < calcs_per_dim?} L -->|Yes| E[Evaluate next] L -->|No| S[Stop early]

style L fill:#e0f0ff
style S fill:#ffff99```

Diagram

 # Build correlation table for combination

            table = BUILD_CORRELATION_TABLE(combo, train_df)

            score = CALCULATE_DISCRIMINATION(table)



            IF score > best.score:

                best = (combo, score)



RETURN best

FUNCTION BUILD_CORRELATION_TABLE(features, train_df):

# Group by all feature bins

grouped = train_df.group_by([f.bin_column for f in features]).agg(

    target = sum(target_weight),

    non_target = sum(weight) - sum(tar

```mermaid flowchart TD START[Start] --> Q1{Feature correlations?} Q1 -->|Yes| Q2{How many features?} Q1 -->|"No"| A1["n_dims=1"]

Q2 -->|"< 20"| A2["n_dims=2"]
Q2 -->|"20-50"| A3["n_dims=3"]
Q2 -->|"50+"| A4["n_dims=2, then experiment"]

Q2 -->|"Simple"| A2

style A2 fill:#ccffcc
style A1 fill:#ffff99
style A3 fill:#ffff99
style A4 fill:#ffff99```

Diagram

Summary

Parameter Effect
n_dims=1 Single feature splits
n_dims=2 Feature pairs - captures correlations
n_dims=3 Feature triplets
calcs_per_dim Limits computation

Next Steps