Skip to content

Frequently Asked Questions

General

What is Pilz?

Pilz is a machine learning library that generates interpretable SQL rules. It's designed for classification tasks where you need both accuracy and explainability.

When should I use Pilz?

  • When you need interpretable models
  • When predictions should run in SQL/DuckDB
  • For tabular data classification
  • For multi-class problems

When should I NOT use Pilz?

  • For image/audio/video (use neural networks)
  • For very large datasets (>1M rows, without tuning)
  • When model interpretability doesn't matter

What's the difference from sklearn DecisionTree?

Aspect sklearn Pilz
Splits Binary Three-way (Left/Neutral/Right)
Output Python model SQL code
Feature combinations No Yes (n_dims)
Categorization Manual Automatic

What does "Pilz" mean?

"Pilz" is German for "mushroom". The project originally aimed to be different from traditional approaches - neither plant nor animal, just like Pilz is neither neural network nor traditional tree.

Installation

What Python version is required?

Python 3.13 or higher.

What platforms are supported?

  • macOS (Apple Silicon/ARM)
  • Linux (x86)

Windows support is planned.

How do I install?

uv init pilz-project
cd pilz-project
uv add pilz

Or with pip:

pip install pilz

Training

What do the settings mean?

See Settings Reference for detailed explanation.

How many trees should I use?

Start with n=1 for testing, then increase to 3-10 for production.

What is n_dims?

n_dims controls whether splits consider single features or combinations:

  • n_dims=1: Single features
  • n_dims=2: Feature pairs
  • n_dims=3: Feature triplets

Why is training slow?

Common causes: - Too many features with high n_dims - Large max_eval_fit - Too many categories (n_cat)

See Troubleshooting for solutions.

What is "neutral" branch?

Unlike traditional binary trees, Pilz has three branches: - Left: Likely negative class - Neutral: Uncertain - continues splitting - Right: Likely positive class

Evaluation

What is AUC?

AUC (Area Under the ROC Curve) measures how well the model distinguishes classes. 1.0 is perfect, 0.5 is random.

What is a good AUC?

AUC Interpretation
0.9-1.0 Excellent
0.8-0.9 Good
0.7-0.8 Fair
0.5-0.7 Poor

How do I improve accuracy?

  1. Increase n (more trees)
  2. Increase n_dims (feature combinations)
  3. Increase max_depth
  4. Adjust n_cat

Deployment

How do I use the SQL in production?

from pilz.model import Pilz

pilz = Pilz.model_validate_json(open("model.json").read())
sql = pilz.get_sql("score_column")

Then execute the SQL in DuckDB or your database.

Can I use the model in other languages?

Yes! The SQL is database-agnostic and can be adapted to any SQL database.

Troubleshooting

See Troubleshooting for common issues and solutions.

Contributing

How can I contribute?

  1. Fork on GitLab
  2. Create a feature branch
  3. Add tests
  4. Submit merge request

Where is the source code?

GitLab: https://gitlab.com/gwhe/pilz