{% from 'partials/hero_cta.html' import hero_cta %}
{% call hero_cta(
theme='cyan',
icon_class='fa-database',
icon_shape='rounded',
title='Dataset Configuration',
description='Configure the training data that SimpleTuner will use. If you started from an example configuration, you already have a demo dataset — use the wizard to replace it with your own images.',
features=[
{'icon': 'fa-images', 'color': 'text-info', 'label': 'Image & Video'},
{'icon': 'fa-comment-alt', 'color': 'text-warning', 'label': 'Captions'},
{'icon': 'fa-crop-alt', 'color': 'text-success', 'label': 'Cropping & Resolution'},
],
show_condition='showHeroCTA()',
dismiss_method='dismissHeroCTA()',
cta_primary={'label': 'Launch Dataset Wizard', 'icon': 'fa-magic', 'action': 'launchWizardFromHero()'},
cta_secondary={'label': 'Dismiss', 'icon': 'fa-times', 'action': 'dismissHeroCTA()'},
tip_text='The wizard walks you through adding local folders, cloud storage, or HuggingFace datasets with proper caption and resolution settings.'
) %}
Dataset Types
Train on images, videos, or audio. Each dataset needs a corresponding
text embeds cache for caption processing.
Caption Strategies
Use textfile (image.txt next to image.png), filename,
or a fixed instance prompt for all images.
Resolution & Cropping
Set target resolution (e.g., 1024px), minimum sizes to exclude tiny images,
and crop styles (center, random, face-aware).
Need Help?
This section can be overwhelming! We recommend opening the
Dataloader Documentation
in another window while exploring this page. If you get stuck, join our Discord community — the invite link is in the
project README .
{% endcall %}
Dataset Configuration
Configure datasets and data loaders for training
{% if data_backend_choices %}
{% for option in data_backend_choices %}
{% endfor %}
{% endif %}
{% include 'trainer_dataloader_section.html' %}
{% include 'partials/dataset_wizard_modal.html' %}