Settings Reference¶
Complete reference for all training and evaluation settings.
TrainSettings¶
Required Parameters¶
| Parameter | Type | Description |
|-----------|------|-------------|
| n | int | Number of trees to build per target class |
| out_folder | string | Directory to save trained models |
Optional Parameters¶
| Parameter | Default | Description |
|-----------|---------|-------------|
| max_depth | unlimited | Maximum tree depth (prevents overfitting) |
| frac_eval_cat | 0.8 | Fraction of data for evaluation vs categorization |
| max_eval_fit | 100000 | Maximum samples to use for training |
| min_eval_fit | 10 | Minimum samples before stopping |
| n_dims | 1 | Feature combinations to evaluate (1=single, 2=pairs, etc.) |
| n_cat | 5 | Number of bins per feature |
| calcs_per_dim | null | Maximum combinations to evaluate per dimension |
| max_workers | 1 | Number of parallel threads |
| neutral_faktor | 0.2 | Threshold for neutral branch |
| use_neutral_state | false | Enable three-way splits |
Example: TrainSettings¶
# Minimum required
n: 5
out_folder: my_model
# With all options
n: 10
out_folder: my_model
max_depth: 15
frac_eval_cat: 0.8
max_eval_fit: 50000
min_eval_fit: 50
n_dims: 3
n_cat: 5
calcs_per_dim: 5000
max_workers: 4
neutral_faktor: 0.2
use_neutral_state: true
EvalSettings¶
Required Parameters¶
| Parameter | Type | Description |
|-----------|------|-------------|
| in_folders | list | Directories containing trained models |
| out_folder | string | Directory for evaluation results |
Optional Parameters¶
| Parameter | Default | Description |
|-----------|---------|-------------|
| out_file | null | Output file path (.csv or .parquet) |
| keep_cols | null | Columns to include in output |
| max_parallel_where | 1000 | Split SQL if more conditions |
| max_workers | 1 | Number of parallel threads |
Example: EvalSettings¶
# Minimum required
in_folders:
- model
out_folder: results
# With all options
in_folders:
- model_v1
- model_v2
out_folder: results
out_file: predictions.csv
keep_cols:
- customer_id
- date
max_parallel_where: 500
max_workers: 4
Parameter Guide¶
n: Number of Trees¶
| Value | Effect |
|-------|--------|
| 1 | Fast, baseline |
| 3-5 | Good balance |
| 10+ | More accurate, slower |
max_depth¶
| Value | Effect |
|-------|--------|
| 5-10 | Shallow, fast, general |
| 10-15 | Medium |
| 15+ | Deep, may overfit |
n_dims¶
| Value | Effect |
|-------|--------|
| 1 | Single features only |
| 2 | Feature pairs |
| 3+ | Complex interactions |
n_cat¶
| Value | Effect |
|-------|--------|
| 2-3 | Few bins, general |
| 5 | Default |
| 10+ | Many bins, specific |
calcs_per_dim¶
| Value | Effect |
|-------|--------|
| null | No limit |
| 1000 | Quick |
| 10000 | Thorough |
| 100000+ | Exhaustive |
Quick Reference Table¶
```mermaid flowchart LR subgraph "Simple Problem" S1[n: 1] --> S2[n_dims: 1] --> S3[n_cat: 3] --> S4[max_depth: 10] end
subgraph "Normal Problem"
N1[n: 3-5] --> N2[n_dims: 2] --> N3[n_cat: 5] --> N4[max_depth: 15]
end
subgraph "Complex Problem"
C1[n: 10+] --> C2[n_dims: 3+] --> C3[n_cat: 8+] --> C4[max_depth: 20]
end```