api
¶
Public API for aquarion-libtts.
All interaction with aquarion-libtts is generally expected to go through this API package.
Example
Type Aliases:
-
JSONSerializableTypes–Basic Python types that are easily serializable to JSON.
-
TTSSettingsSpecEntryTypes–Valid types for a settings spec entry.
-
TTSSettingsSpecType–The type of the TTS settings spec mapping.
Classes:
-
HashablePathLike–PathLikes are hashable, but this makes it explicit for the type checker.
-
HashableTraversable–Traversables are hashable, but this makes it explicit for the type checker.
-
ITTSBackend–Common interface for all TTS backends.
-
ITTSPlugin–Common interface for all TTS Plugins.
-
ITTSSettings–Common interface for all TTS backend settings.
-
ITTSSettingsHolder–Common interface for objects that accept and contain settings.
-
TTSAudioSpec–Metadata about an audio format.
-
TTSPluginRegistry–Registry of all aquarion-libtts backend plugins.
-
TTSSampleByteOrders–The byte order for multi-byte audio samples.
-
TTSSampleTypes–The data type of a single audio sample.
-
TTSSettingsSpecEntry–An specification entry describing one setting of a settings object.
Functions:
-
load_language–Return a gettext
_()function and a*Translationsinstance. -
tts_hookimpl–Decorate a function to mark it as a TTS plugin registration hook.
JSONSerializableTypes
¶
JSONSerializableTypes = str | int | float | bool | None | Sequence[JSONSerializableTypes] | Mapping[str, JSONSerializableTypes]
Basic Python types that are easily serializable to JSON.
TTSSettingsSpecEntryTypes
¶
Valid types for a settings spec entry.
TTSSettingsSpecEntry types must be one of these types.
TTSSettingsSpecType
¶
TTSSettingsSpecType = Mapping[str, TTSSettingsSpecEntry[TTSSettingsSpecEntryTypes]]
The type of the TTS settings spec mapping.
ITTSPlugin.make_spec returns this.
HashablePathLike
¶
HashableTraversable
¶
Bases: Hashable, Traversable
Traversables are hashable, but this makes it explicit for the type checker.
ITTSBackend
¶
Bases: ITTSSettingsHolder, Protocol
Common interface for all TTS backends.
An ITTSBackend is responsible for converting text in to a stream of speech audio
chunks. To do this, it should first be started with the
start method, then the
convert method can be used to do any
number of text to speech conversions, and finally it should be shut down with the
stop method when no longer needed.
An ITTSBackend is also responsible for reporting the kind of audio that it
produces (e.g. raw PCM, WAVE, MP3, OGG, VP8, stereo, mono, 8-bit, 16-bit, etc.).
This is reported via the
audio_spec attribute.
Lastly, since each ITTSBackend is also an
ITTSSettingsHolder, then it must
also accept configuration settings. These are commonly provided at instantiation,
but that is not strictly required to conform to the
ITTSSettingsHolder protocol.
Methods:
-
convert–Return speech audio for the given text as one or more binary chunks.
-
get_settings–Return the current setting in use.
-
start–Start the TTS backend.
-
stop–Stop the TTS backend.
-
update_settings–Update to the new given settings.
Attributes:
-
audio_spec(TTSAudioSpec) –Metadata about the speech audio format.
-
is_started(bool) –Whether or not the backend already started.
audio_spec
property
¶
audio_spec: TTSAudioSpec
Metadata about the speech audio format.
E.g. Mono 16-bit big endian linear PCM audio at 24KHz with a MIME type of
audio/L16.
Returns:
-
TTSAudioSpec–The audio output format emitted by the convert method.
Note
This should be a read-only property.
convert
¶
get_settings
¶
get_settings() -> ITTSSettings
Return the current setting in use.
Returns:
-
ITTSSettings–The current settings in use.
Note
The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.
start
¶
Start the TTS backend.
Note
If the backend is already started, this method should be idempotent and do nothing.
stop
¶
Stop the TTS backend.
Note
If the backend is already started, this method should be idempotent and do nothing.
update_settings
¶
update_settings(new_settings: ITTSSettings) -> None
Update to the new given settings.
Parameters:
-
(new_settings¶ITTSSettings) –The new complete set of settings to start using immediately.
Raises:
-
TypeError–Implementations of this interface should check that they are only getting the correct concrete settings class and raise an exception if any other kind of ITTSSettings is given.
Note
The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.
ITTSPlugin
¶
Bases: Protocol
Common interface for all TTS Plugins.
Methods:
-
get_display_name–Return the display name for the plugin, appropriate for the given locale.
-
get_setting_description–Return the given setting’s description, appropriate for the given locale.
-
get_setting_display_name–Return the given setting’s display name, appropriate for the given locale.
-
get_settings_spec–Return a specification that describes all the backend’s settings.
-
get_supported_locales–Return the set of speech locales supported by the TTS backend.
-
make_backend–Create and return a TTS backend instance.
-
make_settings–Create and return an appropriate settings object for the TTS backend.
Attributes:
id
property
¶
id: str
A unique identifier for the plugin.
The ID must be unique across all aquarion-libtts plugins. Also, it is recommended to include at least a major version number as a suffix so that multiple versions / implementations of a plugin can be installed and supported simultaneously. E.g. for backwards compatibility.
Returns:
-
str–The unique identifier for the plugin.
Example
kokoro_v1
Note
This should be a read-only property.
get_display_name
¶
Return the display name for the plugin, appropriate for the given locale.
A display name is one that is human-friendly as opposed to any kind of unique key that code would care about.
Parameters:
-
(locale¶str) –The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like
en_CA,zh-Hant,ca-ES-valencia, or evende_DE.UTF-8@euro. It can be as general asfror as specific aslanguage_territory_script_variant@modifier.Plugins are expected to to do their best to accommodate the given locale, but can fall back to a more general language variant if that is all it supports. E.g. from
en_CAto justen.
Returns:
-
str–The display name of the plugin in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a display name in it’s default language, or English if that is preferred.
get_setting_description
¶
get_setting_description(setting_name: str, locale: str) -> str
Return the given setting’s description, appropriate for the given locale.
Parameters:
-
(setting_name¶str) –The name of the setting as returned from get_settings_spec mapping keys.
-
(locale¶str) –The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like
en_CA,zh-Hant,ca-ES-valencia, or evende_DE.UTF-8@euro. It can be as general asfror as specific aslanguage_territory_script_variant@modifier.Plugins are expected to to do their best to accommodate the given locale, but can fall back to a more general language variant if that is all it supports. E.g. from
en_CAto justen.
Returns:
-
str–The description of the setting in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a description in it’s default language, or English if that is preferred.
Raises:
-
KeyError or AttributeError–If the given setting name is not a recognized setting.
get_setting_display_name
¶
get_setting_display_name(setting_name: str, locale: str) -> str
Return the given setting’s display name, appropriate for the given locale.
A display name is one that is human-friendly as opposed to any kind of unique key that code would care about.
Parameters:
-
(setting_name¶str) –The name of the setting as returned from get_settings_spec mapping keys.
-
(locale¶str) –The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like
en_CA,zh-Hant,ca-ES-valencia, or evende_DE.UTF-8@euro. It can be as general asfror as specific aslanguage_territory_script_variant@modifier.Plugins are expected to to do their best to accommodate the given locale, but can fall back to a more general language variant if that is all it supports. E.g. from
en_CAto justen.
Returns:
-
str–The display name of the setting in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a display name in it’s default language, or English if that is preferred.
Raises:
-
KeyError or AttributeError–If the given setting name is not a recognized setting.
get_settings_spec
¶
get_settings_spec() -> TTSSettingsSpecType
Return a specification that describes all the backend’s settings.
Returns:
-
TTSSettingsSpecType–An immutable mapping of from setting attribute name to TTSSettingsSpecEntry instances.
Implementations should probably return a types.MappingProxyType to achieve the immutability.
get_supported_locales
¶
Return the set of speech locales supported by the TTS backend.
This should also be the locales that the plugin supports for display names, setting names, setting descriptions, etc.
Locales can be in either POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) formats, and client applications are expected to support both.
Returns:
Note
The set of locales should as be specific as is directly supported and should
not include broader / more general or approximate catch-all locales unless
they are also explicitly supported, or nothing more specific is supported.
I.e. en_CA is good, en is bad, unless en is the most specific the TTS
backend supports. Or, if ca-ES-valencia is supported, then that is
preferred over ca-ES. … In short, be as precise and honest as you can.
make_backend
¶
make_backend(settings: ITTSSettings) -> ITTSBackend
Create and return a TTS backend instance.
This is a factory method.
Parameters:
-
(settings¶ITTSSettings) –Custom or default settings must be provided to configure the TTS backend. See make_settings for details.
Returns:
-
ITTSBackend–A configured TTS backend, ready to use.
Raises:
-
TypeError–Implementations of this interface must check that they are getting their own ITTSSettings implementation and should raise an exception if any other plugin’s settings object is given instead.
make_settings
¶
make_settings(from_dict: Mapping[str, JSONSerializableTypes] | None = None) -> ITTSSettings
Create and return an appropriate settings object for the TTS backend.
This is a factory method.
Parameters:
-
(from_dict¶Mapping[str, JSONSerializableTypes] | None, default:None) –
Note
If from_dict is provided, then each setting value in it is expected to be
validated by this method.
Returns:
-
ITTSSettings–A compatible settings instance with all settings values valid for immediate use.
Raises:
-
(KeyError, ValueError or TypeError)–If any setting value is invalid for the concrete implementation of ITTSSettings that this factory will create, then an exception should be raised.
ITTSSettings
¶
Bases: Protocol
Common interface for all TTS backend settings.
Implementations of this interface are expected to add their own setting attributes for the specific ITTSBackend implementation they go with.
Note
There is no expectation that ITTSSettings implementations be immutable or hashable, but it’s probably a good idea since changes to settings should be done by calling the ITTSPlugin.make_settings method with a changed settings dictionary.
Example
Methods:
-
to_dict–Export all settings as a dictionary of only JSON-serializable types.
Attributes:
locale
instance-attribute
¶
locale: str
The locale for spoken audio language.
The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant
(i.e. using hyphens) locale string like en_CA, zh-Hant, ca-ES-valencia, or
even de_DE.UTF-8@euro. It can be as general as fr or as specific as
language_territory_script_variant@modifier.
to_dict
¶
to_dict() -> dict[str, JSONSerializableTypes]
Export all settings as a dictionary of only JSON-serializable types.
Returns:
-
dict[str, JSONSerializableTypes]–A dictionary where the keys are the setting names and the values are the setting values converted as necessary to simple base JSON-compatible types.
ITTSSettingsHolder
¶
Bases: Protocol
Common interface for objects that accept and contain settings.
Methods:
-
get_settings–Return the current setting in use.
-
update_settings–Update to the new given settings.
get_settings
¶
get_settings() -> ITTSSettings
Return the current setting in use.
Returns:
-
ITTSSettings–The current settings in use.
Note
The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.
update_settings
¶
update_settings(new_settings: ITTSSettings) -> None
Update to the new given settings.
Parameters:
-
(new_settings¶ITTSSettings) –The new complete set of settings to start using immediately.
Raises:
-
TypeError–Implementations of this interface should check that they are only getting the correct concrete settings class and raise an exception if any other kind of ITTSSettings is given.
Note
The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.
TTSAudioSpec
dataclass
¶
TTSAudioSpec(*, mime_type: str, sample_rate: int, sample_type: TTSSampleTypes, sample_width: int, byte_order: TTSSampleByteOrders, num_channels: int)
Metadata about an audio format.
This describes the audio format that an ITTSBackend returns.
Note
Instances of this class are immutable once created.
Attributes:
-
byte_order(TTSSampleByteOrders) –E.g. Little Endian or Big Endian.
-
mime_type(str) –E.g. “audio/L16”, “audio/webm”, “audio/mpeg”, etc.
-
num_channels(int) –E.g. 1 for mono, 2 for stereo, etc.
-
sample_rate(int) –E.g 8000, 24000, 48000, etc.
-
sample_type(TTSSampleTypes) –E.g. Signed Integer, Unsigned Integer or Floating Point.
-
sample_width(int) –E.g. 8 for 8-bit, 12 for 12-bit, 16 for 16-bit, etc.
mime_type
instance-attribute
¶
mime_type: str
E.g. “audio/L16”, “audio/webm”, “audio/mpeg”, etc.
Note
Probably raw Linear PCM audio is preferred so that client applications can do with it what they want, but any format is allowed.
sample_type
instance-attribute
¶
sample_type: TTSSampleTypes
E.g. Signed Integer, Unsigned Integer or Floating Point.
TTSPluginRegistry
¶
Registry of all aquarion-libtts backend plugins.
TTS backends and everything related to them are created / accessed through ITTSPlugin instances. The plugin registry is responsible for finding, loading, listing, enabling, disabling and giving access to those plugins.
Methods:
-
disable–Disable a TTS plugin for exclusion in in enabled plugins list.
-
enable–Enable a TTS plugin for inclusion in enabled plugins list.
-
get_plugin–Return the plugin the for the given ID.
-
is_enabled–Return whether or not the requested plugin is enabled.
-
list_plugin_ids–Return the set of plugin IDs.
-
load_plugins–Load all aquarion-libtts backend plugins.
disable
¶
Disable a TTS plugin for exclusion in in enabled plugins list.
Parameters:
Raises:
-
ValueError–If the given ID does not match any registered plugin.
Note
Disabling a plugin does not affect any existing instances of that plugin in any way. So, proper TTS backend instance management and stopping must still be handled separately.
enable
¶
Enable a TTS plugin for inclusion in enabled plugins list.
The idea behind enabled vs disabled plugins is that it allows one to manage which plugins are listed / displayed to a user, independently of all the plugins that are installed / loaded. I.e. It allows for filtering which plugins one wants exposed and which should be kept hidden. E.g. Some plugins could be not supported by your application, even though they got installed with some other dependency.
Parameters:
Raises:
-
ValueError–If the given ID does not match any registered plugin.
get_plugin
¶
get_plugin(id_: str) -> ITTSPlugin
Return the plugin the for the given ID.
Parameters:
Returns:
-
ITTSPlugin–The requested plugin object.
Raises:
-
ValueError–If the given ID does not match any registered plugin.
is_enabled
¶
list_plugin_ids
¶
Return the set of plugin IDs.
By default, only enabled plugins are listed.
Parameters:
-
(only_disabled¶bool, default:False) –If True, then only the disabled plugins are listed.
-
(list_all¶bool, default:False) –If True, then all plugins are listed, regardless of their enabled / disabled status.
Returns:
Raises:
-
ValueError–If both arguments are True.
load_plugins
¶
Load all aquarion-libtts backend plugins.
Plugins are discovered by searching for
pyproject.toml entry points ⧉
named aquarion-libtts, then searching those entry points for hook functions
decorated with tts_hookimpl, and
finally calling those hook functions. The plugins returned by those hook
functions are then stored in the plugin registry and made accessible.
Note
All plugins are disabled by default. Use the enable method to enable a plugin.
Parameters:
Raises:
-
PluginValidationError–If
validateis True and a hook function does not conform to the expected specification.
Examples:
TTSSampleByteOrders
¶
Bases: StrEnum
The byte order for multi-byte audio samples.
Note
The string values of these types match FFmpeg’s format descriptions ⧉, in case that is ever useful.
Attributes:
-
BIG_ENDIAN–The most significant byte is stored first, then the least significant byte.
-
LITTLE_ENDIAN–The least significant byte is stored first, then the most significant byte.
-
NOT_APPLICABLE–This should only be used for 8-bit (i.e. single byte) samples.
BIG_ENDIAN
class-attribute
instance-attribute
¶
The most significant byte is stored first, then the least significant byte.
LITTLE_ENDIAN
class-attribute
instance-attribute
¶
The least significant byte is stored first, then the most significant byte.
NOT_APPLICABLE
class-attribute
instance-attribute
¶
This should only be used for 8-bit (i.e. single byte) samples.
TTSSampleTypes
¶
Bases: StrEnum
The data type of a single audio sample.
Note
The string values of these types match FFmpeg’s format descriptions ⧉, in case that is ever useful.
Attributes:
-
FLOAT–Floating point samples.
-
NOT_APPLICABLE–This should only be used for compressed or variable bit streams.
-
SIGNED_INT–Signed integer samples. (I.e. positive and negative numbers allowed.)
-
UNSIGNED_INT–Unsigned integer samples. (I.e. only positive numbers, but with more values.)
NOT_APPLICABLE
class-attribute
instance-attribute
¶
This should only be used for compressed or variable bit streams.
SIGNED_INT
class-attribute
instance-attribute
¶
Signed integer samples. (I.e. positive and negative numbers allowed.)
UNSIGNED_INT
class-attribute
instance-attribute
¶
Unsigned integer samples. (I.e. only positive numbers, but with more values.)
TTSSettingsSpecEntry
dataclass
¶
TTSSettingsSpecEntry(*, type: type[T], min: int | float | None = None, max: int | float | None = None, values: frozenset[T] | None = None)
An specification entry describing one setting of a settings object.
Since ITTSSettings can contain custom TTS backend specific setting attributes, there is a need for a way to describe those setting attributes in a standardized way so that settings UIs can be constructed dynamically in applications that use aquarion-libtts. Instances of this class, in a dictionary, for example, can provide a specification for how to render settings fields in a UI.
Example
spec = {
"locale": TTSSettingSpecEntry(
type=str, min=2, values=frozenset("en", "fr")
),
"voice": TTSSettingSpecEntry(type=str),
"speed": TTSSettingSpecEntry(type=float, min=0.1, max=1.0),
"api_key": TTSSettingSpecEntry(type=str),
"cache_path": TTSSettingSpecEntry(type=str),
}
With the example above, one could imagine a UI with multiple text box fields.
locale could be a dropdown or a set of radio buttons, there could be
validation for valid ranges, speed could have up and down arrow buttons to
increase and decrease the value, and / or react to a mouse’s scroll wheel, etc.
Note
Instances of this class are immutable once created, as are all the attribute values as well.
Attributes:
-
max(int | float | None) –The maximum allowed value or maximum allowed length.
-
min(int | float | None) –The minimum allowed value or minimum allowed length.
-
type(type[T]) –The type of setting it is.
-
values(frozenset[T] | None) –The set of specific allowed values.
max
class-attribute
instance-attribute
¶
The maximum allowed value or maximum allowed length.
This is optional.
For strings this is the maximum allowed length of the string.
For numeric types, this is the highest allowed value.
min
class-attribute
instance-attribute
¶
The minimum allowed value or minimum allowed length.
This is optional.
For strings this is the minimum allowed length of the string.
For numeric types, this is the lowest allowed value.
type
instance-attribute
¶
type: type[T]
The type of setting it is.
This is required.
Valid types are defined in TTSSettingsSpecEntryTypes.
Notes
-
This should be set to the actual type class, not a string name of a type.
-
Also, only Python basic types should be used. I.e. not classes like pathlib.Path or decimal.Decimal, etc.
load_language
cached
¶
load_language(locale: str, domain: str, locale_path: HashablePathLike | HashableTraversable | str) -> tuple[Callable[[str], str], NullTranslations]
Return a gettext _() function and a *Translations instance.
Parameters:
-
(locale¶str) –The desired locale to find and load. E.g.
en_CAorfr, etc.localemust be parsable by the Babel ⧉ package and will be normalized by it as well.localeis generally expected to be in POSIX format (i.e. using underscores) but CLDR format (i.e. using hyphens) is also supported and will be converted to POSIX format automatically for the purpose of finding translation catalogues.If an exact match on locale cannot be found, less specific fallback locales well be used instead. E.g. if
kk_Cyrl_KZis not found, thenkk_Cyrlwill be tried, and then justkk.If no matching locale is found, then the gettext methods will just return the hard coded strings from the source file.
-
(domain¶str) –A name unique to your app / project. This domain name becomes the file name of your message catalogues and templates. For example you you could your project’s name or your root package’s name. E.g.
my-cool-project.Attention: Do not use
aquarion-libttsas your domain name. That is reserved for this project. -
(locale_path¶HashablePathLike | HashableTraversable | str) –The base path where your language files can be found. This can be a regular path (as a str or a pathlib.Path) or this could be some path inside your own Python package, retrieved with the help of importlib.resources.files, for example.
Note: It is recommended that third-party TTS plugins keep their translation files inside their package (i.e. wheel) by using importlib.resources.files to access a locale directory.
Returns:
-
tuple[Callable[[str], str], NullTranslations]–A tuple of a gettext callable and an instance of a NullTranslations sub-class (e.g. GNUTranslations).
The
gettextcallable is provided for easy use of the more common action.The
*Translationsinstance provides access to all the other, less common translation capabilities one might need, e.g.ngettext,pgettext, etc.Attention: It is common practice to name the
gettextcallable_so that extracting and retrieving translated messages is as easy as_("text to be translated"). In fact, if you use Babel ⧉ this will be expected by default for translatable strings to be found.
Raises:
-
various–If an invalid locale is given various possible exceptions can be raised. See Babel ⧉ package’s babel.core.Locale.parse for details.
Example
Note
Once loaded, the language translations are cached for the duration of the process.
tts_hookimpl
¶
tts_hookimpl(**kwargs: Any) -> Callable[[], ITTSPlugin | None]
Decorate a function to mark it as a TTS plugin registration hook.
This is a decorator.
The decorated function is expected to accept no arguments and to return an ITTSPlugin, or None if no plugin is to be registered. E.g. Missing dependencies, incompatible hardware, etc.
For more detailed usage options, see the Pluggy ⧉ package.
Parameters:
Returns:
-
Callable[[], ITTSPlugin | None]–The decorated function, but marked as a TTS plugin registration hook.