Skip to content

api

Public API for aquarion-libtts.

All interaction with aquarion-libtts is generally expected to go through this API package.

Example
registry = TTSPluginRegistry()
registry.load_plugins()
registry.enable("kokoro_v1")
plugin = registry.get_plugin("kokoro_v1")
settings = plugin.make_settings()
backend = plugin.make_backend(settings)
try:
    backend.start()
    audio_chunks = []
    for audio_chunk in :
        audio_chunks.append(audio_chunk)
finally:
    backend.stop()

Type Aliases:

Classes:

Functions:

  • load_language

    Return a gettext _() function and a *Translations instance.

  • tts_hookimpl

    Decorate a function to mark it as a TTS plugin registration hook.

JSONSerializableTypes

JSONSerializableTypes = str | int | float | bool | None | Sequence[JSONSerializableTypes] | Mapping[str, JSONSerializableTypes]

Basic Python types that are easily serializable to JSON.

TTSSettingsSpecEntryTypes

TTSSettingsSpecEntryTypes = str | int | float

Valid types for a settings spec entry.

TTSSettingsSpecEntry types must be one of these types.

TTSSettingsSpecType

The type of the TTS settings spec mapping.

ITTSPlugin.make_spec returns this.

HashablePathLike

Bases: Hashable, PathLike[str]

PathLikes are hashable, but this makes it explicit for the type checker.

HashableTraversable

Bases: Hashable, Traversable

Traversables are hashable, but this makes it explicit for the type checker.

ITTSBackend

Bases: ITTSSettingsHolder, Protocol

Common interface for all TTS backends.

An ITTSBackend is responsible for converting text in to a stream of speech audio chunks. To do this, it should first be started with the start method, then the convert method can be used to do any number of text to speech conversions, and finally it should be shut down with the stop method when no longer needed.

An ITTSBackend is also responsible for reporting the kind of audio that it produces (e.g. raw PCM, WAVE, MP3, OGG, VP8, stereo, mono, 8-bit, 16-bit, etc.). This is reported via the audio_spec attribute.

Lastly, since each ITTSBackend is also an ITTSSettingsHolder, then it must also accept configuration settings. These are commonly provided at instantiation, but that is not strictly required to conform to the ITTSSettingsHolder protocol.

Methods:

  • convert

    Return speech audio for the given text as one or more binary chunks.

  • get_settings

    Return the current setting in use.

  • start

    Start the TTS backend.

  • stop

    Stop the TTS backend.

  • update_settings

    Update to the new given settings.

Attributes:

audio_spec property

audio_spec: TTSAudioSpec

Metadata about the speech audio format.

E.g. Mono 16-bit big endian linear PCM audio at 24KHz with a MIME type of audio/L16.

Returns:

Note

This should be a read-only property.

is_started property

is_started: bool

Whether or not the backend already started.

Returns:

Note

This should be a read-only property.

convert

convert(text: str) -> Iterator[bytes]

Return speech audio for the given text as one or more binary chunks.

Parameters:

  • text

    (str) –

    The text to convert in to speech.

Returns:

get_settings

get_settings() -> ITTSSettings

Return the current setting in use.

Returns:

Note

The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.

start

start() -> None

Start the TTS backend.

Note

If the backend is already started, this method should be idempotent and do nothing.

stop

stop() -> None

Stop the TTS backend.

Note

If the backend is already started, this method should be idempotent and do nothing.

update_settings

update_settings(new_settings: ITTSSettings) -> None

Update to the new given settings.

Parameters:

  • new_settings

    (ITTSSettings) –

    The new complete set of settings to start using immediately.

Raises:

  • TypeError

    Implementations of this interface should check that they are only getting the correct concrete settings class and raise an exception if any other kind of ITTSSettings is given.

Note

The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.

ITTSPlugin

Bases: Protocol

Common interface for all TTS Plugins.

Methods:

  • get_display_name

    Return the display name for the plugin, appropriate for the given locale.

  • get_setting_description

    Return the given setting’s description, appropriate for the given locale.

  • get_setting_display_name

    Return the given setting’s display name, appropriate for the given locale.

  • get_settings_spec

    Return a specification that describes all the backend’s settings.

  • get_supported_locales

    Return the set of speech locales supported by the TTS backend.

  • make_backend

    Create and return a TTS backend instance.

  • make_settings

    Create and return an appropriate settings object for the TTS backend.

Attributes:

  • id (str) –

    A unique identifier for the plugin.

id property

id: str

A unique identifier for the plugin.

The ID must be unique across all aquarion-libtts plugins. Also, it is recommended to include at least a major version number as a suffix so that multiple versions / implementations of a plugin can be installed and supported simultaneously. E.g. for backwards compatibility.

Returns:

  • str

    The unique identifier for the plugin.

Example

kokoro_v1

Note

This should be a read-only property.

get_display_name

get_display_name(locale: str) -> str

Return the display name for the plugin, appropriate for the given locale.

A display name is one that is human-friendly as opposed to any kind of unique key that code would care about.

Parameters:

  • locale

    (str) –

    The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like en_CA, zh-Hant, ca-ES-valencia, or even de_DE.UTF-8@euro. It can be as general as fr or as specific as language_territory_script_variant@modifier.

    Plugins are expected to to do their best to accommodate the given locale, but can fall back to a more general language variant if that is all it supports. E.g. from en_CA to just en.

Returns:

  • str

    The display name of the plugin in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a display name in it’s default language, or English if that is preferred.

get_setting_description

get_setting_description(setting_name: str, locale: str) -> str

Return the given setting’s description, appropriate for the given locale.

Parameters:

  • setting_name

    (str) –

    The name of the setting as returned from get_settings_spec mapping keys.

  • locale

    (str) –

    The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like en_CA, zh-Hant, ca-ES-valencia, or even de_DE.UTF-8@euro. It can be as general as fr or as specific as language_territory_script_variant@modifier.

    Plugins are expected to to do their best to accommodate the given locale, but can fall back to a more general language variant if that is all it supports. E.g. from en_CA to just en.

Returns:

  • str

    The description of the setting in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a description in it’s default language, or English if that is preferred.

Raises:

get_setting_display_name

get_setting_display_name(setting_name: str, locale: str) -> str

Return the given setting’s display name, appropriate for the given locale.

A display name is one that is human-friendly as opposed to any kind of unique key that code would care about.

Parameters:

  • setting_name

    (str) –

    The name of the setting as returned from get_settings_spec mapping keys.

  • locale

    (str) –

    The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like en_CA, zh-Hant, ca-ES-valencia, or even de_DE.UTF-8@euro. It can be as general as fr or as specific as language_territory_script_variant@modifier.

    Plugins are expected to to do their best to accommodate the given locale, but can fall back to a more general language variant if that is all it supports. E.g. from en_CA to just en.

Returns:

  • str

    The display name of the setting in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a display name in it’s default language, or English if that is preferred.

Raises:

get_settings_spec

get_settings_spec() -> TTSSettingsSpecType

Return a specification that describes all the backend’s settings.

Returns:

get_supported_locales

get_supported_locales() -> Set[str]

Return the set of speech locales supported by the TTS backend.

This should also be the locales that the plugin supports for display names, setting names, setting descriptions, etc.

Locales can be in either POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) formats, and client applications are expected to support both.

Returns:

  • Set[str]

    An immutable set of locale strings.

Example
frozenset({"fr_CA", "ca-ES-valencia", "zh-Hant"})
Note

The set of locales should as be specific as is directly supported and should not include broader / more general or approximate catch-all locales unless they are also explicitly supported, or nothing more specific is supported. I.e. en_CA is good, en is bad, unless en is the most specific the TTS backend supports. Or, if ca-ES-valencia is supported, then that is preferred over ca-ES. … In short, be as precise and honest as you can.

make_backend

make_backend(settings: ITTSSettings) -> ITTSBackend

Create and return a TTS backend instance.

This is a factory method.

Parameters:

  • settings

    (ITTSSettings) –

    Custom or default settings must be provided to configure the TTS backend. See make_settings for details.

Returns:

  • ITTSBackend

    A configured TTS backend, ready to use.

Raises:

  • TypeError

    Implementations of this interface must check that they are getting their own ITTSSettings implementation and should raise an exception if any other plugin’s settings object is given instead.

make_settings

make_settings(from_dict: Mapping[str, JSONSerializableTypes] | None = None) -> ITTSSettings

Create and return an appropriate settings object for the TTS backend.

This is a factory method.

Parameters:

  • from_dict

    (Mapping[str, JSONSerializableTypes] | None, default: None ) –

    If it is not None, then the given values should be used to initialize the settings.

    If it is None, then default values for all settings should be used.

Note

If from_dict is provided, then each setting value in it is expected to be validated by this method.

Returns:

  • ITTSSettings

    A compatible settings instance with all settings values valid for immediate use.

Raises:

ITTSSettings

Bases: Protocol

Common interface for all TTS backend settings.

Implementations of this interface are expected to add their own setting attributes for the specific ITTSBackend implementation they go with.

Note

There is no expectation that ITTSSettings implementations be immutable or hashable, but it’s probably a good idea since changes to settings should be done by calling the ITTSPlugin.make_settings method with a changed settings dictionary.

Example
class MySettings:
    locale: str = "en"
    voice: str = "bella"
    speed: float = 1.0
    api_key: str
    cache_path: Path

    def __eq__(self, other: object) -> bool:
        # Your implementation here

    def to_dict(self) -> dict[str, JSONSerializableTypes]:
        # Your implementation here

Methods:

  • to_dict

    Export all settings as a dictionary of only JSON-serializable types.

Attributes:

  • locale (str) –

    The locale for spoken audio language.

locale instance-attribute

locale: str

The locale for spoken audio language.

The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like en_CA, zh-Hant, ca-ES-valencia, or even de_DE.UTF-8@euro. It can be as general as fr or as specific as language_territory_script_variant@modifier.

to_dict

Export all settings as a dictionary of only JSON-serializable types.

Returns:

  • dict[str, JSONSerializableTypes]

    A dictionary where the keys are the setting names and the values are the setting values converted as necessary to simple base JSON-compatible types.

Example

{
    "locale": "en",
    "voice": "bella",
    "speed": 1.0,
    "api_key": "Your API key here",
    "cache_path": "Cache path converted to a basic string"
}
 

ITTSSettingsHolder

Bases: Protocol

Common interface for objects that accept and contain settings.

Methods:

get_settings

get_settings() -> ITTSSettings

Return the current setting in use.

Returns:

Note

The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.

update_settings

update_settings(new_settings: ITTSSettings) -> None

Update to the new given settings.

Parameters:

  • new_settings

    (ITTSSettings) –

    The new complete set of settings to start using immediately.

Raises:

  • TypeError

    Implementations of this interface should check that they are only getting the correct concrete settings class and raise an exception if any other kind of ITTSSettings is given.

Note

The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.

TTSAudioSpec dataclass

TTSAudioSpec(*, mime_type: str, sample_rate: int, sample_type: TTSSampleTypes, sample_width: int, byte_order: TTSSampleByteOrders, num_channels: int)

Metadata about an audio format.

This describes the audio format that an ITTSBackend returns.

Note

Instances of this class are immutable once created.

Attributes:

byte_order instance-attribute

byte_order: TTSSampleByteOrders

E.g. Little Endian or Big Endian.

mime_type instance-attribute

mime_type: str

E.g. “audio/L16”, “audio/webm”, “audio/mpeg”, etc.

Note

Probably raw Linear PCM audio is preferred so that client applications can do with it what they want, but any format is allowed.

num_channels instance-attribute

num_channels: int

E.g. 1 for mono, 2 for stereo, etc.

sample_rate instance-attribute

sample_rate: int

E.g 8000, 24000, 48000, etc.

sample_type instance-attribute

sample_type: TTSSampleTypes

E.g. Signed Integer, Unsigned Integer or Floating Point.

sample_width instance-attribute

sample_width: int

E.g. 8 for 8-bit, 12 for 12-bit, 16 for 16-bit, etc.

TTSPluginRegistry

TTSPluginRegistry()

Registry of all aquarion-libtts backend plugins.

TTS backends and everything related to them are created / accessed through ITTSPlugin instances. The plugin registry is responsible for finding, loading, listing, enabling, disabling and giving access to those plugins.

Methods:

  • disable

    Disable a TTS plugin for exclusion in in enabled plugins list.

  • enable

    Enable a TTS plugin for inclusion in enabled plugins list.

  • get_plugin

    Return the plugin the for the given ID.

  • is_enabled

    Return whether or not the requested plugin is enabled.

  • list_plugin_ids

    Return the set of plugin IDs.

  • load_plugins

    Load all aquarion-libtts backend plugins.

disable

disable(plugin_id: str) -> None

Disable a TTS plugin for exclusion in in enabled plugins list.

Parameters:

  • plugin_id

    (str) –

    The ID of the desired plugin.

Raises:

  • ValueError

    If the given ID does not match any registered plugin.

Note

Disabling a plugin does not affect any existing instances of that plugin in any way. So, proper TTS backend instance management and stopping must still be handled separately.

enable

enable(plugin_id: str) -> None

Enable a TTS plugin for inclusion in enabled plugins list.

The idea behind enabled vs disabled plugins is that it allows one to manage which plugins are listed / displayed to a user, independently of all the plugins that are installed / loaded. I.e. It allows for filtering which plugins one wants exposed and which should be kept hidden. E.g. Some plugins could be not supported by your application, even though they got installed with some other dependency.

Parameters:

  • plugin_id

    (str) –

    The ID of the desired plugin.

Raises:

  • ValueError

    If the given ID does not match any registered plugin.

get_plugin

get_plugin(id_: str) -> ITTSPlugin

Return the plugin the for the given ID.

Parameters:

  • id_

    (str) –

    The ID of the desired, already loaded, plugin. E.g. kokoro_v1.

Returns:

Raises:

  • ValueError

    If the given ID does not match any registered plugin.

is_enabled

is_enabled(plugin_id: str) -> bool

Return whether or not the requested plugin is enabled.

Parameters:

  • plugin_id

    (str) –

    The ID of the plugin to check.

Returns:

list_plugin_ids

list_plugin_ids(*, only_disabled: bool = False, list_all: bool = False) -> set[str]

Return the set of plugin IDs.

By default, only enabled plugins are listed.

Parameters:

  • only_disabled

    (bool, default: False ) –

    If True, then only the disabled plugins are listed.

  • list_all

    (bool, default: False ) –

    If True, then all plugins are listed, regardless of their enabled / disabled status.

Returns:

  • set[str]

    The set of plugin IDs.

Raises:

load_plugins

load_plugins(*, validate: bool = True) -> None

Load all aquarion-libtts backend plugins.

Plugins are discovered by searching for pyproject.toml entry points ⧉ named aquarion-libtts, then searching those entry points for hook functions decorated with tts_hookimpl, and finally calling those hook functions. The plugins returned by those hook functions are then stored in the plugin registry and made accessible.

Note

All plugins are disabled by default. Use the enable method to enable a plugin.

Parameters:

  • validate

    (bool, default: True ) –

    If True, then an exception is raised if any hook functions do not conform to the expected hook specification.

    If False, then this check is bypassed.

Raises:

Examples:

pyproject.toml
[project.entry-points.'aquarion-libtts']
my_plugin_v1 = "package.hook"
myhookmodule.py
@tts_hookimpl
def register_my_tts_plugin() -> ITTSPlugin | None:
    from package.plugin import MyTTSPlugin
    return MyTTSPlugin()

TTSSampleByteOrders

Bases: StrEnum

The byte order for multi-byte audio samples.

Note

The string values of these types match FFmpeg’s format descriptions ⧉, in case that is ever useful.

Attributes:

  • BIG_ENDIAN

    The most significant byte is stored first, then the least significant byte.

  • LITTLE_ENDIAN

    The least significant byte is stored first, then the most significant byte.

  • NOT_APPLICABLE

    This should only be used for 8-bit (i.e. single byte) samples.

BIG_ENDIAN class-attribute instance-attribute

BIG_ENDIAN = 'be'

The most significant byte is stored first, then the least significant byte.

LITTLE_ENDIAN class-attribute instance-attribute

LITTLE_ENDIAN = 'le'

The least significant byte is stored first, then the most significant byte.

NOT_APPLICABLE class-attribute instance-attribute

NOT_APPLICABLE = ''

This should only be used for 8-bit (i.e. single byte) samples.

TTSSampleTypes

Bases: StrEnum

The data type of a single audio sample.

Note

The string values of these types match FFmpeg’s format descriptions ⧉, in case that is ever useful.

Attributes:

  • FLOAT

    Floating point samples.

  • NOT_APPLICABLE

    This should only be used for compressed or variable bit streams.

  • SIGNED_INT

    Signed integer samples. (I.e. positive and negative numbers allowed.)

  • UNSIGNED_INT

    Unsigned integer samples. (I.e. only positive numbers, but with more values.)

FLOAT class-attribute instance-attribute

FLOAT = 'f'

Floating point samples.

NOT_APPLICABLE class-attribute instance-attribute

NOT_APPLICABLE = ''

This should only be used for compressed or variable bit streams.

SIGNED_INT class-attribute instance-attribute

SIGNED_INT = 's'

Signed integer samples. (I.e. positive and negative numbers allowed.)

UNSIGNED_INT class-attribute instance-attribute

UNSIGNED_INT = 'u'

Unsigned integer samples. (I.e. only positive numbers, but with more values.)

TTSSettingsSpecEntry dataclass

TTSSettingsSpecEntry(*, type: type[T], min: int | float | None = None, max: int | float | None = None, values: frozenset[T] | None = None)

An specification entry describing one setting of a settings object.

Since ITTSSettings can contain custom TTS backend specific setting attributes, there is a need for a way to describe those setting attributes in a standardized way so that settings UIs can be constructed dynamically in applications that use aquarion-libtts. Instances of this class, in a dictionary, for example, can provide a specification for how to render settings fields in a UI.

Example
spec = {
    "locale": TTSSettingSpecEntry(
        type=str, min=2, values=frozenset("en", "fr")
    ),
    "voice": TTSSettingSpecEntry(type=str),
    "speed": TTSSettingSpecEntry(type=float, min=0.1, max=1.0),
    "api_key": TTSSettingSpecEntry(type=str),
    "cache_path": TTSSettingSpecEntry(type=str),
}

With the example above, one could imagine a UI with multiple text box fields. locale could be a dropdown or a set of radio buttons, there could be validation for valid ranges, speed could have up and down arrow buttons to increase and decrease the value, and / or react to a mouse’s scroll wheel, etc.

Note

Instances of this class are immutable once created, as are all the attribute values as well.

Attributes:

  • max (int | float | None) –

    The maximum allowed value or maximum allowed length.

  • min (int | float | None) –

    The minimum allowed value or minimum allowed length.

  • type (type[T]) –

    The type of setting it is.

  • values (frozenset[T] | None) –

    The set of specific allowed values.

max class-attribute instance-attribute

max: int | float | None = None

The maximum allowed value or maximum allowed length.

This is optional.

For strings this is the maximum allowed length of the string.

For numeric types, this is the highest allowed value.

min class-attribute instance-attribute

min: int | float | None = None

The minimum allowed value or minimum allowed length.

This is optional.

For strings this is the minimum allowed length of the string.

For numeric types, this is the lowest allowed value.

type instance-attribute

type: type[T]

The type of setting it is.

This is required.

Valid types are defined in TTSSettingsSpecEntryTypes.

Notes
  • This should be set to the actual type class, not a string name of a type.

  • Also, only Python basic types should be used. I.e. not classes like pathlib.Path or decimal.Decimal, etc.

values class-attribute instance-attribute

values: frozenset[T] | None = None

The set of specific allowed values.

This is optional.

Some fields might only accept a restricted set of specific valid values. Think enumerations. Acceptable values can be specified with this attribute.

load_language cached

Return a gettext _() function and a *Translations instance.

Parameters:

  • locale

    (str) –

    The desired locale to find and load. E.g. en_CA or fr, etc.

    locale must be parsable by the Babel ⧉ package and will be normalized by it as well.

    locale is generally expected to be in POSIX format (i.e. using underscores) but CLDR format (i.e. using hyphens) is also supported and will be converted to POSIX format automatically for the purpose of finding translation catalogues.

    If an exact match on locale cannot be found, less specific fallback locales well be used instead. E.g. if kk_Cyrl_KZ is not found, then kk_Cyrl will be tried, and then just kk.

    If no matching locale is found, then the gettext methods will just return the hard coded strings from the source file.

  • domain

    (str) –

    A name unique to your app / project. This domain name becomes the file name of your message catalogues and templates. For example you you could your project’s name or your root package’s name. E.g. my-cool-project.

    Attention: Do not use aquarion-libtts as your domain name. That is reserved for this project.

  • locale_path

    (HashablePathLike | HashableTraversable | str) –

    The base path where your language files can be found. This can be a regular path (as a str or a pathlib.Path) or this could be some path inside your own Python package, retrieved with the help of importlib.resources.files, for example.

    Note: It is recommended that third-party TTS plugins keep their translation files inside their package (i.e. wheel) by using importlib.resources.files to access a locale directory.

Returns:

  • tuple[Callable[[str], str], NullTranslations]

    A tuple of a gettext callable and an instance of a NullTranslations sub-class (e.g. GNUTranslations).

    The gettext callable is provided for easy use of the more common action.

    The *Translations instance provides access to all the other, less common translation capabilities one might need, e.g. ngettext, pgettext, etc.

    Attention: It is common practice to name the gettext callable _ so that extracting and retrieving translated messages is as easy as _("text to be translated"). In fact, if you use Babel ⧉ this will be expected by default for translatable strings to be found.

Raises:

Example
1
2
3
4
5
6
7
8
from importlib.resources import files
from typing import cast

from aquarion.libs.libtts.api import HashableTraversable

locale_path = cast(HashableTraversable, files(__name__) / "locale")
_, t = load_language("fr_CA", domain="my-cool-project", locale_path=locale_path)
print(_("I will be translated"))
Note

Once loaded, the language translations are cached for the duration of the process.

tts_hookimpl

tts_hookimpl(**kwargs: Any) -> Callable[[], ITTSPlugin | None]

Decorate a function to mark it as a TTS plugin registration hook.

This is a decorator.

The decorated function is expected to accept no arguments and to return an ITTSPlugin, or None if no plugin is to be registered. E.g. Missing dependencies, incompatible hardware, etc.

For more detailed usage options, see the Pluggy ⧉ package.

Parameters:

  • kwargs

    (Any, default: {} ) –

    Any keyword arguments supported by Pluggy ⧉.

Returns:

  • Callable[[], ITTSPlugin | None]

    The decorated function, but marked as a TTS plugin registration hook.

Example
@tts_hookimpl
def register_my_tts_plugin() -> ITTSPlugin | None:
    # NOTE: It is important that we do not import our plugin class or
    #       related packages at module import time.
    #       This hook needs to be able to run even when our required
    #       dependencies, etc. are not installed.
    try:
        import dependency
    except ModuleNotFoundError:
        return None
    from package.plugin import MyTTSPlugin

    return MyTTSPlugin()