Skip to content

SQL Rules for Deployment

One of Pilz's most powerful features is generating actual SQL code that can run directly in your database.

From Model to SQL

How It Works

flowchart LR
    subgraph Small_Model
        S1[10 rules] --> SQ[Simple SQL]
    end

    subgraph Large_Model
        S2[500 rules] --> P[Problem!]
        P --> B[Batching Required]
    end

    style S1 fill:#ccffcc
    style S2 fill:#ffcccc```

![Diagram](images/sql_rules_1.svg)



    subgraph Transformation

        T[get_sql function]

    end



    subgraph SQL_Rules

        S[WHERE clauses]

    end



    subgraph Database

        D[Filtered rows]

    end



    J --> T

    T --> S

    S --> D



    style J fill:#e0f0ff

    style S fill:#ccffcc

    style D fill:#ffff99```

### Multiple Trees (Pilze)

When you have multiple trees (ensemble), scores are averaged:

```sql

SELECT

    customer_id,

    (

        /* Tree 0 */

        CASE WHEN ... THEN ... END +

        /* Tree 1 */

        CASE WHEN ... THEN ... END +

        /* Tree 2 */

        CASE WHEN ... THEN ... END

    ) / 3.0 AS churn_score

FROM customers

Practical Example

Load the Model

from pilz.model import Pilz, Pilze

# Load a single tree

pilz = Pilz.model_validate_json(open("model/Yes/0.json").read())

# Get SQL

sql = pilz.get_sql("churn_score")

print(sql)

Generated SQL

CASE

    WHEN "tenure" <= 12 AND "Contract" = 'Month-to-month' THEN 0.68

    WHEN "tenure" > 12 AND "Contract" = 'Month-to-month' AND "InternetService" = 'Fiber optic' THEN 0.55

    WHEN "Contract" = 'Two year' THEN 0.12

    WHEN "Contract" = 'One year' AND "tenure" > 24 THEN 0.18

    ELSE 0.35

END

Execute in DuckDB

import duckdb

# Create table

conn = duckdb.connect("customers.db")

conn.execute("CREATE TABLE customers AS SELECT * FROM 'customer_data.csv'")

# Run prediction

result = conn.execute("""

    SELECT

        customer_id,

        (

            (CASE WHEN tenure <= 12 AND Contract = 'Month-to-month' THEN 0.68 ELSE 0.35 END) +

            (CASE WHEN tenure <= 6 AND Contract = 'Month-to-month' THEN 0.72 ELSE 0.31 END) +

            (CASE WHEN tenure <= 9 AND Contract = 'Month-to-month' THEN 0.65 ELSE 0.38 END)

        ) / 3.0 AS predicted_churn

    FROM customers

""").fetchdf()

print(result.head())

Handling Large Rule Sets

The Problem

Some models have hundreds of rules, which can exceed SQL length limits:

flowchart LR
    A[Start] --> B[Process]
    B --> C[End]

    style A fill:#e0f0ff
    style C fill:#ccffcc```

![Diagram](images/sql_rules_2.svg)

mermaid

flowchart LR

    subgraph Small_Model

        S1[10 rules] --> SQ[Simple SQL]

    end



    subgraph Large_Model

        S2[500 rules] --> P[Problem!]

        P --> B[Batching Required]

    end



    style P fill:#ffcccc

    style B fill:#ffff99```

rge rule sets:

```yaml

# eval_settings.yaml

max_parallel_where: 1000  # Split if more than 1000 conditions
flowchart TB
    subgraph "Original (500 rules)"
        O[Full CASE statement]
    end

    subgraph "Batched (3 parts)"
        B1[Part 1: 170 rules]
        B2[Part 2: 170 rules]
        B3[Part 3: 160 rules]
    end```

![Diagram](images/sql_rules_3.svg)

mermaid
flowchart TB
    subgraph "Original (500 rules)"
        O[Full CASE statement]
    end

    subgraph "Batched (3 parts)"
        B1[Part 1: 170 rules]
        B2[Part 2: 170 rules]
        B3[Part 3: 160 rules]
    end

    subgraph "Combined Result"
        C[Sum all parts]
    end

    O --> B1 & B2 & B3
    B1 & B2 & B3 --> C

    style B1 fill:#ccffcc
    style B2 fill:#ccffcc
    style B3 fill:#ccffcc```

       ELSE 0
        END AS score_1
    FROM customers
),

-- Part 2: Second set of conditions
batch2 AS (
    SELECT
        customer_id,
        CASE
            WHEN condition_170 THEN score_170
            ...
            ELSE 0
        END AS score_2
    FROM customers
),

-- Combine
final AS (
    SELECT
        customer_id,
        score_1 + score_2 AS total_score
    FROM batch1
    JOIN batch2 USING (customer_id)
)

SELECT
    customer_id,
    total_score
FROM final

Deployment Patterns

Pattern 1: View Creation

-- Create a view for easy querying
CREATE VIEW customer_churn_scores AS
SELECT
    customer_id,
    first_name,
    last_name,
    CASE
        WHEN contract = 'Month-to-month' AND tenure <= 12 THEN 0.68
        ...
    END AS churn_probability
FROM customers;

Pattern 2: Materialized Table

-- Update predictions nightly
CREATE TABLE churn_predictions AS
SELECT
    customer_id,
    run_prediction() AS churn_probability,
    CURRENT_TIMESTAMP AS predicted_at
FROM customers;

Pattern 3: Stored Procedure

-- Create a function
CREATE OR REPLACE FUNCTION predict_churn(p_customer_id INT)
RETURNS FLOAT
LANGUAGE SQL
AS $$
    SELECT CASE
        WHEN contract = 'Month-to-month' AND tenure <= 12 THEN 0.68
        WHEN contract = 'Two year' THEN 0.12
        ELSE 0.35
    END
    FROM customers
    WHERE id = p_customer_id
$$;

SQL Generation in Code

Direct SQL from Pilz

from pilz.model import Pilz

# Load model
pilz = Pilz.model_validate_json(open("model/Yes/0.json").read())

# Get single SQL statement
sql = pilz.get_sql("score")
print(sql)

# Get multiple statements (for batching)
batch_sql = pilz.get_split_sql(max_conditions=100)
for i, sql in enumerate(batch_sql):
    print(f"Batch {i}: {sql}")

From Pilze (ensemble)

from pilz.model import Pilze

# Load all trees for a target
pilze = Pilze.load_folder("model/Yes")

# Get averaged SQL
sql = pilze.get_sql("score")

Security Considerations

Parameterized Queries

While Pilz generates raw SQL, for production consider:

# Instead of string concatenation
sql = f"SELECT ... WHERE balance > {user_input}"  # Dangerous!

# Use parameterized queries
sql = "SELECT ... WHERE balance > ?"  # Safe

Input Validation

def safe_prediction(customer_data):
    # Validate inputs
    assert 0 <= customer_data['tenure'] <= 100
    assert customer_data['contract'] in ['Month-to-month', 'One year', 'Two year']

    # Now use in SQL
    ...

Summary

Feature Description
get_sql() Generate CASE statement
get_split_sql() Split large rules into batches
Pilze.get_sql() Average multiple trees
DuckDB Run predictions in-database

Next Steps