Frequently Asked Questions¶
General¶
What is Pilz?¶
Pilz is a machine learning library that generates interpretable SQL rules. It's designed for classification tasks where you need both accuracy and explainability.
When should I use Pilz?¶
- When you need interpretable models
- When predictions should run in SQL/DuckDB
- For tabular data classification
- For multi-class problems
When should I NOT use Pilz?¶
- For image/audio/video (use neural networks)
- For very large datasets (>1M rows, without tuning)
- When model interpretability doesn't matter
What's the difference from sklearn DecisionTree?¶
| Aspect | sklearn | Pilz |
|---|---|---|
| Splits | Binary | Three-way (Left/Neutral/Right) |
| Output | Python model | SQL code |
| Feature combinations | No | Yes (n_dims) |
| Categorization | Manual | Automatic |
What does "Pilz" mean?¶
"Pilz" is German for "mushroom". The project originally aimed to be different from traditional approaches - neither plant nor animal, just like Pilz is neither neural network nor traditional tree.
Installation¶
What Python version is required?¶
Python 3.13 or higher.
What platforms are supported?¶
- macOS (Apple Silicon/ARM)
- Linux (x86)
Windows support is planned.
How do I install?¶
Or with pip:
Training¶
What do the settings mean?¶
See Settings Reference for detailed explanation.
How many trees should I use?¶
Start with n=1 for testing, then increase to 3-10 for production.
What is n_dims?¶
n_dims controls whether splits consider single features or combinations:
n_dims=1: Single featuresn_dims=2: Feature pairsn_dims=3: Feature triplets
Why is training slow?¶
Common causes: - Too many features with high n_dims - Large max_eval_fit - Too many categories (n_cat)
See Troubleshooting for solutions.
What is "neutral" branch?¶
Unlike traditional binary trees, Pilz has three branches: - Left: Likely negative class - Neutral: Uncertain - continues splitting - Right: Likely positive class
Evaluation¶
What is AUC?¶
AUC (Area Under the ROC Curve) measures how well the model distinguishes classes. 1.0 is perfect, 0.5 is random.
What is a good AUC?¶
| AUC | Interpretation |
|---|---|
| 0.9-1.0 | Excellent |
| 0.8-0.9 | Good |
| 0.7-0.8 | Fair |
| 0.5-0.7 | Poor |
How do I improve accuracy?¶
- Increase
n(more trees) - Increase
n_dims(feature combinations) - Increase
max_depth - Adjust
n_cat
Deployment¶
How do I use the SQL in production?¶
from pilz.model import Pilz
pilz = Pilz.model_validate_json(open("model.json").read())
sql = pilz.get_sql("score_column")
Then execute the SQL in DuckDB or your database.
Can I use the model in other languages?¶
Yes! The SQL is database-agnostic and can be adapted to any SQL database.
Troubleshooting¶
See Troubleshooting for common issues and solutions.
Contributing¶
How can I contribute?¶
- Fork on GitLab
- Create a feature branch
- Add tests
- Submit merge request
Where is the source code?¶
GitLab: https://gitlab.com/gwhe/pilz