When enabled, audio will be automatically extracted from video files and used for audio conditioning.
A separate audio dataset will be created automatically.
Audio Format
16000 Hz recommended for Wav2Vec2
Bucket rounding (seconds)
Handling Options
Which part of audio to keep when truncating
Generate zero-filled audio for videos without audio tracks