DataCard Reference¶
The DataCard is a YAML file that describes your dataset for Pilz.
Complete Example¶
features:
- name: age
statistical: categorial
type: int
missing_value: 0
- name: name
statistical: categorial
type: string
- name: balance
statistical: numerical
type: float
missing_value: 0.0
target:
feature_name: churn
values:
- 'Yes'
- 'No'
infos:
source: https://example.com/data
license: MIT
date: '2024-01-01'
train_files:
- /path/to/train.csv
- /path/to/train2.csv
test_files:
- /path/to/test.csv
Field Reference¶
features¶
List of feature definitions.
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Column name in CSV |
statistical |
string | Yes | "categorial" or "numerical" |
type |
string | Yes | "int", "float", or "string" |
missing_value |
any | No | Value to replace nulls |
target¶
Describes the target variable.
| Field | Type | Required | Description |
|---|---|---|---|
feature_name |
string | Yes | Column name of target |
values |
list | Yes | All possible class values |
infos (optional)¶
Metadata dictionary for documentation.
train_files¶
Paths to training data files.
test_files¶
Paths to test/evaluation data files.
FeatureType Values¶
| Value | Use For | Example |
|---|---|---|
categorial |
Discrete values | "yes", "no", "admin", "blue-collar" |
numerical |
Continuous values | 1.5, 100, -42 |
!> Important: Use categorial (not categorical)
DataType Values¶
| Value | Description |
|---|---|
int |
Integer numbers |
float |
Decimal numbers |
string |
Text values |
Auto-Generation¶
Instead of writing manually, use:
This interactive command will prompt you for feature types and target selection.
Validation¶
Pilz validates the DataCard:
- All feature names must exist in CSV
- Target column must exist
- All target values must exist in target column
- File paths must exist