Skip to content

DataCard Reference

The DataCard is a YAML file that describes your dataset for Pilz.

Complete Example

features:
  - name: age
    statistical: categorial
    type: int
    missing_value: 0
  - name: name
    statistical: categorial
    type: string
  - name: balance
    statistical: numerical
    type: float
    missing_value: 0.0

target:
  feature_name: churn
  values:
    - 'Yes'
    - 'No'

infos:
  source: https://example.com/data
  license: MIT
  date: '2024-01-01'

train_files:
  - /path/to/train.csv
  - /path/to/train2.csv

test_files:
  - /path/to/test.csv

Field Reference

features

List of feature definitions.

Field Type Required Description
name string Yes Column name in CSV
statistical string Yes "categorial" or "numerical"
type string Yes "int", "float", or "string"
missing_value any No Value to replace nulls

target

Describes the target variable.

Field Type Required Description
feature_name string Yes Column name of target
values list Yes All possible class values

infos (optional)

Metadata dictionary for documentation.

infos:
  source: https://example.com
  license: MIT
  description: Customer churn dataset
  version: '1.0'

train_files

Paths to training data files.

train_files:
  - /path/to/train.csv
  - /path/to/train2.csv  # Multiple files supported

test_files

Paths to test/evaluation data files.

test_files:
  - /path/to/test.csv

FeatureType Values

Value Use For Example
categorial Discrete values "yes", "no", "admin", "blue-collar"
numerical Continuous values 1.5, 100, -42

!> Important: Use categorial (not categorical)

DataType Values

Value Description
int Integer numbers
float Decimal numbers
string Text values

Auto-Generation

Instead of writing manually, use:

pilz create-dc --src data.csv --out datacard.yaml

This interactive command will prompt you for feature types and target selection.

Validation

Pilz validates the DataCard:

  • All feature names must exist in CSV
  • Target column must exist
  • All target values must exist in target column
  • File paths must exist