Skip to content

Kokoro TTS Requirements

The aquarion-libtts project documents it’s requirements using Behaviour Driven Development ⧉ (BDD) and automated acceptance tests. The following are the BDD-style requirements for aquarion-libtts’s Kokoro TTS ⧉ backend.

@kokoro
Feature: Kokoro TTS
    As a software developer
    I want to use the Kokoro TTS Model to generate speech from text
    So that I can provide text-to-speech functionality in my application

    Background:
        Given I have a TTSBackendRegistry
        And I have loaded all available plugins
        And I am using the 'kokoro_v1' plugin

    @gpu
    Scenario: Using an NVIDIA GPU
        When I make settings with 'device' set to 'cuda'
        And I make the backend using the settings
        And I measure baseline GPU memory usage
        And I start the backend
        Then the model should be loaded in the GPU

    Scenario: Using the CPU
        When I make settings with 'device' set to 'cpu'
        And I make the backend using the settings
        And I measure baseline GPU memory usage
        And I start the backend
        Then the model should be loaded in the CPU

    Scenario Outline: Changing the Locale and/or Voice
        When I make the default settings for the backend
        And I make the backend using the settings
        And I start the backend
        And I make new settings using <locale> and <voice>
        And I update the backend with the new settings
        Then the backend should use the new settings

        Examples:
            | locale | voice    |
            | en_US  | af_heart |
            | en_GB  | bf_emma  |
            | fr_FR  | ff_siwis |

    Scenario: Changing the Speed
        When I make settings with 'speed' set to '1.0'
        And I make the backend using the settings
        And I start the backend
        And I make new settings with 'speed' set to '0.5'
        And I update the backend with the new settings
        Then the backend should use the new settings

    Scenario: Converting Text to Speech
        When I make the default settings for the backend
        And I make the backend using the settings
        And I start the backend
        And I convert 'Hi there!' to speech
        Then the audio output should be as expected

    @gpu
    Scenario: Checking for GPU Memory Leaks
        When I make settings with 'device' set to 'cuda'
        And I make the backend using the settings
        And I measure baseline GPU memory usage
        And I start the backend
        And I convert text to speech '30' times in a row
        Then GPU memory usage remain consistent

    Scenario: Working Offline with No Network
        When I make settings with paths to pre-existing local files
        And I make the backend using the settings
        And I start the backend
        Then no network downloading occurs