{% from 'partials/hero_cta.html' import hero_cta %}
{% call hero_cta( theme='cyan', icon_class='fa-database', icon_shape='rounded', title='Dataset Configuration', description='Configure the training data that SimpleTuner will use. If you started from an example configuration, you already have a demo dataset — use the wizard to replace it with your own images.', features=[ {'icon': 'fa-images', 'color': 'text-info', 'label': 'Image & Video'}, {'icon': 'fa-comment-alt', 'color': 'text-warning', 'label': 'Captions'}, {'icon': 'fa-crop-alt', 'color': 'text-success', 'label': 'Cropping & Resolution'}, ], show_condition='showHeroCTA()', dismiss_method='dismissHeroCTA()', cta_primary={'label': 'Launch Dataset Wizard', 'icon': 'fa-magic', 'action': 'launchWizardFromHero()'}, cta_secondary={'label': 'Dismiss', 'icon': 'fa-times', 'action': 'dismissHeroCTA()'}, tip_text='The wizard walks you through adding local folders, cloud storage, or HuggingFace datasets with proper caption and resolution settings.' ) %}
Dataset Types

Train on images, videos, or audio. Each dataset needs a corresponding text embeds cache for caption processing.

Caption Strategies

Use textfile (image.txt next to image.png), filename, or a fixed instance prompt for all images.

Resolution & Cropping

Set target resolution (e.g., 1024px), minimum sizes to exclude tiny images, and crop styles (center, random, face-aware).

Need Help?

This section can be overwhelming! We recommend opening the Dataloader Documentation in another window while exploring this page. If you get stuck, join our Discord community — the invite link is in the project README .

{% endcall %}
Dataset Configuration

Configure datasets and data loaders for training

{% if data_backend_choices %}
{% endif %}
{% include 'trainer_dataloader_section.html' %}
{% include 'partials/dataset_wizard_modal.html' %}
{% include 'dataset_viewer_tab.html' %}