Skip to content

Settings Reference

Complete reference for all training and evaluation settings.

TrainSettings

Required Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| n | int | Number of trees to build per target class |

| out_folder | string | Directory to save trained models |

Optional Parameters

| Parameter | Default | Description |

|-----------|---------|-------------|

| max_depth | unlimited | Maximum tree depth (prevents overfitting) |

| frac_eval_cat | 0.8 | Fraction of data for evaluation vs categorization |

| max_eval_fit | 100000 | Maximum samples to use for training |

| min_eval_fit | 10 | Minimum samples before stopping |

| n_dims | 1 | Feature combinations to evaluate (1=single, 2=pairs, etc.) |

| n_cat | 5 | Number of bins per feature |

| calcs_per_dim | null | Maximum combinations to evaluate per dimension |

| max_workers | 1 | Number of parallel threads |

| neutral_faktor | 0.2 | Threshold for neutral branch |

| use_neutral_state | false | Enable three-way splits |

Example: TrainSettings

# Minimum required

n: 5

out_folder: my_model

# With all options

n: 10

out_folder: my_model

max_depth: 15

frac_eval_cat: 0.8

max_eval_fit: 50000

min_eval_fit: 50

n_dims: 3

n_cat: 5

calcs_per_dim: 5000

max_workers: 4

neutral_faktor: 0.2

use_neutral_state: true

EvalSettings

Required Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| in_folders | list | Directories containing trained models |

| out_folder | string | Directory for evaluation results |

Optional Parameters

| Parameter | Default | Description |

|-----------|---------|-------------|

| out_file | null | Output file path (.csv or .parquet) |

| keep_cols | null | Columns to include in output |

| max_parallel_where | 1000 | Split SQL if more conditions |

| max_workers | 1 | Number of parallel threads |

Example: EvalSettings

# Minimum required

in_folders:

  - model

out_folder: results

# With all options

in_folders:

  - model_v1

  - model_v2

out_folder: results

out_file: predictions.csv

keep_cols:

  - customer_id

  - date

max_parallel_where: 500

max_workers: 4

Parameter Guide

n: Number of Trees

| Value | Effect |

|-------|--------|

| 1 | Fast, baseline |

| 3-5 | Good balance |

| 10+ | More accurate, slower |

max_depth

| Value | Effect |

|-------|--------|

| 5-10 | Shallow, fast, general |

| 10-15 | Medium |

| 15+ | Deep, may overfit |

n_dims

| Value | Effect |

|-------|--------|

| 1 | Single features only |

| 2 | Feature pairs |

| 3+ | Complex interactions |

n_cat

| Value | Effect |

|-------|--------|

| 2-3 | Few bins, general |

| 5 | Default |

| 10+ | Many bins, specific |

calcs_per_dim

| Value | Effect |

|-------|--------|

| null | No limit |

| 1000 | Quick |

| 10000 | Thorough |

| 100000+ | Exhaustive |

Quick Reference Table

```mermaid flowchart LR subgraph "Simple Problem" S1[n: 1] --> S2[n_dims: 1] --> S3[n_cat: 3] --> S4[max_depth: 10] end

subgraph "Normal Problem"
    N1[n: 3-5] --> N2[n_dims: 2] --> N3[n_cat: 5] --> N4[max_depth: 15]
end

subgraph "Complex Problem"
    C1[n: 10+] --> C2[n_dims: 3+] --> C3[n_cat: 8+] --> C4[max_depth: 20]
end```

Diagram