{% from 'partials/hero_cta.html' import hero_cta %}
Train on images, videos, or audio. Each dataset needs a corresponding text embeds cache for caption processing.
Use textfile (image.txt next to image.png), filename, or a fixed instance prompt for all images.
Set target resolution (e.g., 1024px), minimum sizes to exclude tiny images, and crop styles (center, random, face-aware).
This section can be overwhelming! We recommend opening the Dataloader Documentation in another window while exploring this page. If you get stuck, join our Discord community — the invite link is in the project README .
Configure datasets and data loaders for training