SQL Rules for Deployment¶
One of Pilz's most powerful features is generating actual SQL code that can run directly in your database.
From Model to SQL¶
How It Works¶
flowchart LR
subgraph Small_Model
S1[10 rules] --> SQ[Simple SQL]
end
subgraph Large_Model
S2[500 rules] --> P[Problem!]
P --> B[Batching Required]
end
style S1 fill:#ccffcc
style S2 fill:#ffcccc```

subgraph Transformation
T[get_sql function]
end
subgraph SQL_Rules
S[WHERE clauses]
end
subgraph Database
D[Filtered rows]
end
J --> T
T --> S
S --> D
style J fill:#e0f0ff
style S fill:#ccffcc
style D fill:#ffff99```
### Multiple Trees (Pilze)
When you have multiple trees (ensemble), scores are averaged:
```sql
SELECT
customer_id,
(
/* Tree 0 */
CASE WHEN ... THEN ... END +
/* Tree 1 */
CASE WHEN ... THEN ... END +
/* Tree 2 */
CASE WHEN ... THEN ... END
) / 3.0 AS churn_score
FROM customers
Practical Example¶
Load the Model¶
from pilz.model import Pilz, Pilze
# Load a single tree
pilz = Pilz.model_validate_json(open("model/Yes/0.json").read())
# Get SQL
sql = pilz.get_sql("churn_score")
print(sql)
Generated SQL¶
CASE
WHEN "tenure" <= 12 AND "Contract" = 'Month-to-month' THEN 0.68
WHEN "tenure" > 12 AND "Contract" = 'Month-to-month' AND "InternetService" = 'Fiber optic' THEN 0.55
WHEN "Contract" = 'Two year' THEN 0.12
WHEN "Contract" = 'One year' AND "tenure" > 24 THEN 0.18
ELSE 0.35
END
Execute in DuckDB¶
import duckdb
# Create table
conn = duckdb.connect("customers.db")
conn.execute("CREATE TABLE customers AS SELECT * FROM 'customer_data.csv'")
# Run prediction
result = conn.execute("""
SELECT
customer_id,
(
(CASE WHEN tenure <= 12 AND Contract = 'Month-to-month' THEN 0.68 ELSE 0.35 END) +
(CASE WHEN tenure <= 6 AND Contract = 'Month-to-month' THEN 0.72 ELSE 0.31 END) +
(CASE WHEN tenure <= 9 AND Contract = 'Month-to-month' THEN 0.65 ELSE 0.38 END)
) / 3.0 AS predicted_churn
FROM customers
""").fetchdf()
print(result.head())
Handling Large Rule Sets¶
The Problem¶
Some models have hundreds of rules, which can exceed SQL length limits:
flowchart LR
A[Start] --> B[Process]
B --> C[End]
style A fill:#e0f0ff
style C fill:#ccffcc```

mermaid
flowchart LR
subgraph Small_Model
S1[10 rules] --> SQ[Simple SQL]
end
subgraph Large_Model
S2[500 rules] --> P[Problem!]
P --> B[Batching Required]
end
style P fill:#ffcccc
style B fill:#ffff99```
rge rule sets:
```yaml
# eval_settings.yaml
max_parallel_where: 1000 # Split if more than 1000 conditions
flowchart TB
subgraph "Original (500 rules)"
O[Full CASE statement]
end
subgraph "Batched (3 parts)"
B1[Part 1: 170 rules]
B2[Part 2: 170 rules]
B3[Part 3: 160 rules]
end```

mermaid
flowchart TB
subgraph "Original (500 rules)"
O[Full CASE statement]
end
subgraph "Batched (3 parts)"
B1[Part 1: 170 rules]
B2[Part 2: 170 rules]
B3[Part 3: 160 rules]
end
subgraph "Combined Result"
C[Sum all parts]
end
O --> B1 & B2 & B3
B1 & B2 & B3 --> C
style B1 fill:#ccffcc
style B2 fill:#ccffcc
style B3 fill:#ccffcc```
ELSE 0
END AS score_1
FROM customers
),
-- Part 2: Second set of conditions
batch2 AS (
SELECT
customer_id,
CASE
WHEN condition_170 THEN score_170
...
ELSE 0
END AS score_2
FROM customers
),
-- Combine
final AS (
SELECT
customer_id,
score_1 + score_2 AS total_score
FROM batch1
JOIN batch2 USING (customer_id)
)
SELECT
customer_id,
total_score
FROM final
Deployment Patterns¶
Pattern 1: View Creation¶
-- Create a view for easy querying
CREATE VIEW customer_churn_scores AS
SELECT
customer_id,
first_name,
last_name,
CASE
WHEN contract = 'Month-to-month' AND tenure <= 12 THEN 0.68
...
END AS churn_probability
FROM customers;
Pattern 2: Materialized Table¶
-- Update predictions nightly
CREATE TABLE churn_predictions AS
SELECT
customer_id,
run_prediction() AS churn_probability,
CURRENT_TIMESTAMP AS predicted_at
FROM customers;
Pattern 3: Stored Procedure¶
-- Create a function
CREATE OR REPLACE FUNCTION predict_churn(p_customer_id INT)
RETURNS FLOAT
LANGUAGE SQL
AS $$
SELECT CASE
WHEN contract = 'Month-to-month' AND tenure <= 12 THEN 0.68
WHEN contract = 'Two year' THEN 0.12
ELSE 0.35
END
FROM customers
WHERE id = p_customer_id
$$;
SQL Generation in Code¶
Direct SQL from Pilz¶
from pilz.model import Pilz
# Load model
pilz = Pilz.model_validate_json(open("model/Yes/0.json").read())
# Get single SQL statement
sql = pilz.get_sql("score")
print(sql)
# Get multiple statements (for batching)
batch_sql = pilz.get_split_sql(max_conditions=100)
for i, sql in enumerate(batch_sql):
print(f"Batch {i}: {sql}")
From Pilze (ensemble)¶
from pilz.model import Pilze
# Load all trees for a target
pilze = Pilze.load_folder("model/Yes")
# Get averaged SQL
sql = pilze.get_sql("score")
Security Considerations¶
Parameterized Queries¶
While Pilz generates raw SQL, for production consider:
# Instead of string concatenation
sql = f"SELECT ... WHERE balance > {user_input}" # Dangerous!
# Use parameterized queries
sql = "SELECT ... WHERE balance > ?" # Safe
Input Validation¶
def safe_prediction(customer_data):
# Validate inputs
assert 0 <= customer_data['tenure'] <= 100
assert customer_data['contract'] in ['Month-to-month', 'One year', 'Two year']
# Now use in SQL
...
Summary¶
| Feature | Description |
|---|---|
get_sql() |
Generate CASE statement |
get_split_sql() |
Split large rules into batches |
Pilze.get_sql() |
Average multiple trees |
| DuckDB | Run predictions in-database |
Next Steps¶
- Settings Reference - All deployment settings
- Best Practices - Production tips
- Troubleshooting - Common issues