Skip to content

Getting Started

Installation

aquarion-libtts comes in several different flavours, depending on your needs. These variations are handled by specifying extras when installing.

First, there are extras for supporting various GPU platforms:

  • cpu: Include PyTorch, but only support CPU and not any GPUs.
  • cu128: Include PyTorch with CUDA 12.8 support for Nvidia GPUs.
  • cu129: Include PyTorch with CUDA 12.9 support for Nvidia GPUs.

Second, each built-in TTS backend has it’s own extra so that only the dependencies of the TTS plugins you want to use will be included:

  • kokoro: Include the required dependencies for Kokoro TTS.

So, to install only the base package, without support for any of the built-in TTS backends, you can run something like:

pip install aquarion-libtts

However, in order to use at least one TTS backend, you will probably want to include some extras like this, for example:

pip install aquarion-libtts[cu129,kokoro]

Or:

pip install aquarion-libtts[cpu,kokoro]

Built-In TTS Plugins

aquarion-libtts provides (or will provide) built-in support for several TTS backends. They are accessed through the same plugin API as any third-party TTS backend you might also use.

The following TTS backends currently have built-in support:

Plugin ID TTS Backend
kokoro_v1 Kokoro TTS ⧉

Basic Usage

Key Concepts

  1. The library uses a plugin system to managed multiple TTS backends.
  2. All access to the plugins, their backends and their settings are handled through the api package.
  3. There is a plugin registry that provides access to everything else.
  4. All TTS backends provide the same interface so that they can be used interchangeably.
  5. Each TTS backend can have different configuration settings, however.

Step 1: Instantiate the Registry

from aquarion.libs.libtts import api

registry = api.TTSPluginRegistry()

Step 2: Load All Plugins

registry.load_plugins()

Step 3: Enable Desired Plugins

All loaded plugins are disabled by default. This means that they will not show up in the list of available plugins. Plugins that you want to use should be enabled like so:

registry.enable("kokoro_v1")

Ideally, plugins should be versioned to allow different implementations over time.

Step 4: Instantiate a Plugin

plugin = registry.get_plugin("kokoro_v1")

Plugins are containers that provide access to their TTS backends and backend-appropriate settings, as well as methods for describing the backend and it’s settings in multiple languages. See the below for more details of the descriptive capabilities of plugins.

Step 5: Instantiate Settings

Each backend is expected to support fully functional default settings, in addition to customized settings. To instantiate default settings, do this:

settings = plugin.make_settings()

Or, to instantiate more customized settings, do something like this:

settings = plugin.make_settings(from_dict={
    "voice": "af_bella"
})

Settings are only ever set using a dictionary, not through setting attributes directly. Also, settings objects are immutable once created, so changing settings requires creating a whole new settings instance. This is meant to facilitate the saving and loading of backend settings in a consistent way, as well as make it easier for dynamic settings UIs to be created.

Step 6: Instantiate the TTS Backend

backend = plugin.make_backend(settings)

TTS backends always require a settings object, even if it is the default settings. Also, changing settings in an existing backend requires providing a whole new complete settings instance, since settings are immutable.

Step 7: Start the Backend

Now that we finally have our TTS backend, we need to start it:

backend.start()

Depending on the specific backend, this could start other threads or processes, or access external APIs. It could also download other resources it might need.

Step 8: Convert Text to Speech

When converting text to speech, the results are provided in chunks of audio via an iterator. This better supports streaming and real-time applications. E.g:

import wave

with wave.open("play_me.wav", "wb") as wave_file:
    wave_file.setnchannels(backend.audio_spec.num_channels)
    wave_file.setsampwidth(backend.audio_spec.sample_width // 8)
    wave_file.setframerate(backend.audio_spec.sample_rate)
    for audio_chunk in backend.convert(
        "Hi there from aquarion-libtts.  This is the kokoro backend."
    ):
        # Convert to little endian byte order
        audio_chunk_le = b"".join(
            audio_chunk[i : i + 2][::-1] for i in range(0, len(audio_chunk), 2)
        )
        wave_file.writeframes(audio_chunk_le)

As you can see, the TTS backend also provides information about the returned audio format in it’s .audio_spec attribute.

Note

The Kokoro TTS backend outputs it’s audio in big endian byte order to conform with the RFC 4856 audio/L16 ⧉ MIME type. Wave files require little endian byte order, however, so in the example above a byte swapping step is required.

Step 9: Stop the Backend

When shutting down or switching TTS backends, it is important to always stop the backend to allow it to clean up after itself.

backend.stop()

Best practice would be to wrap your code in a try ... finally block to ensure the stop method is always called, even in the case of an error.

Example

See the examples ⧉ sub-directory for examples of how to use this project.

Beyond the Basics

In addition to the above core functionality, more is provided:

  • The plugin registry also includes methods for listing plugins, enabled or otherwise, as well as disabling plugins, and checking if a plugin is already enabled.

  • Each plugin also includes methods for getting it’s display name in multiple languages, as well as getting details about each specific setting so that a settings UI can be constructed, also in multiple languages.

(Which languages are supported depends on the plugin.)

  • Each settings object also include a method to export the settings as a JSON-compatible dict for storage, editing, etc.

  • Each backend also includes details about the audio format it emits, as well as a check for whether or not it is currently started.

To learn more about these extra capabilities, please see the API Reference documentation.

Creating Your Own TTS Backends

To learn about creating plugins for your own TTS backend for this project, the see plugins documentation.