Metadata-Version: 2.4
Name: sinapsis-parakeet-tdt
Version: 0.1.5
Summary: Speech to text using Parakeet TDT model
Author-email: SinapsisAI <dev@sinapsis.tech>
Project-URL: Homepage, https://sinapsis.tech
Project-URL: Documentation, https://docs.sinapsis.tech/docs
Project-URL: Tutorials, https://docs.sinapsis.tech/tutorials
Project-URL: Repository, https://github.com/Sinapsis-AI/sinapsis-speech.git
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cuda-python>=12.9.0
Requires-Dist: nemo-toolkit[asr]>=2.3.0
Requires-Dist: sinapsis>=0.2.3
Provides-Extra: data-tools
Requires-Dist: sinapsis-data-readers>=0.1.2; extra == "data-tools"
Provides-Extra: all
Requires-Dist: sinapsis-parakeet-tdt[data-tools]; extra == "all"
Dynamic: license-file

<h1 align="center">
<br>
<a href="https://sinapsis.tech/">
  <img
    src="https://github.com/Sinapsis-AI/brand-resources/blob/main/sinapsis_logo/4x/logo.png?raw=true"
    alt="" width="300">
</a><br>
Sinapsis Parakeet TDT
<br>
</h1>

<h4 align="center">Templates for advanced speech-to-text transcription with NVIDIA Parakeet TDT</h4>

<p align="center">
<a href="#installation">🐍 Installation</a> •
<a href="#features"> 🚀 Features</a> •
<a href="#example"> 📚 Usage example</a> •
<a href="#webapp">🌐 Webapp</a> •
<a href="#documentation">📙 Documentation</a> •
<a href="#license">🔍 License</a>
</p>

This **Sinapsis Parakeet TDT** package provides a template for seamlessly integrating, configuring, and running **speech-to-text (STT)** functionalities powered by [NVIDIA's Parakeet TDT model](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2).

<h2 id="installation">🐍 Installation</h2>

Install using your favourite package manager. We strongly encourage the use of <code>uv</code>, although any other package manager should work too.
If you need to install <code>uv</code> please see the [official documentation](https://docs.astral.sh/uv/getting-started/installation/#installation-methods).

Example with <code>uv</code>:
```bash
  uv pip install sinapsis-parkeet-tdt --extra-index-url https://pypi.sinapsis.tech
```
 or with raw <code>pip</code>:
```bash
  pip install sinapsis-parkeet-tdt --extra-index-url https://pypi.sinapsis.tech
```

> [!IMPORTANT]
> Templates in each package may require extra dependencies. For development, we recommend installing the package with all the optional dependencies:
>
with <code>uv</code>:

```bash
  uv pip install sinapsis-parkeet-tdt[all] --extra-index-url https://pypi.sinapsis.tech
```
 or with raw <code>pip</code>:
```bash
  pip install sinapsis-parkeet-tdt[all] --extra-index-url https://pypi.sinapsis.tech
```

<h2 id="features">🚀 Features</h2>

<h3>Templates Supported</h3>

This module includes a template for speech-to-text transcription using the Parakeet TDT model:

**ParakeetTDTInference**: Advanced speech-to-text transcription template powered by NVIDIA's Parakeet TDT 0.6B model, designed for high-quality English transcription with exceptional accuracy across a wide array of accents and dialects, vocal ranges, and diverse domains and noise conditions. The template features comprehensive support for punctuation, capitalization, and accurate timestamp prediction, ensuring professional-grade transcription results. It seamlessly processes audio packets from the input container or specified file paths.

<details>
<summary>Attributes</summary>

- `model_name (str)`: Name or path of the Parakeet TDT model. Defaults to "nvidia/parakeet-tdt-0.6b-v2".
- `audio_paths (list[str] | None)`: Optional list of audio file paths to transcribe. If None, audio will be taken from the AudioPackets in the DataContainer. Defaults to None.
- `enable_timestamps (bool)`: Whether to generate timestamps for the transcription. Defaults to False.
- `timestamp_level (Literal["char", "word", "segment"])`: Level of timestamp detail. Defaults to "word".
- `device (Literal["cpu", "cuda"])`: Device to run the model on. Defaults to "cuda".
- `refresh_cache (bool)`: Whether to refresh the cache when downloading the model. Defaults to False.
</details>

> [!TIP]
> Use CLI command ```sinapsis info --example-template-config TEMPLATE_NAME``` to produce an example Agent config for the Template specified in ***TEMPLATE_NAME***.

For example, for ***ParakeetTDTInference*** use ```sinapsis info --example-template-config ParakeetTDTInference``` to produce an example config like:

```yaml
agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: ParakeetTDTInference
  class_name: ParakeetTDTInference
  template_input: InputTemplate
  attributes:
    model_name: "nvidia/parakeet-tdt-0.6b-v2"
    audio_paths: []
    enable_timestamps: false
    timestamp_level: "word"
    device: "cuda"
    refresh_cache: false
```

<h2 id='example'>📚 Usage example</h2>

This example illustrates how to use the **ParakeetTDTInference** template for speech-to-text transcription. It converts audio input into text using NVIDIA's Parakeet TDT model.

<details>
<summary><strong><span style="font-size: 1.4em;">Config</span></strong></summary>

```yaml
agent:
  name: parakeet_tdt_agent
  description: "Agent that transcribes speech to text using the NVIDIA Parakeet TDT model."

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: AudioReaderSoundfile
  class_name: AudioReaderSoundfile
  template_input: InputTemplate
  attributes:
    audio_file_path: "artifacts/sample.wav"
    source: "artifacts/sample.wav"

- template_name: ParakeetTDTInference
  class_name: ParakeetTDTInference
  template_input: AudioReaderSoundfile
  attributes:
    model_name: "nvidia/parakeet-tdt-0.6b-v2"
    enable_timestamps: true
    timestamp_level: "word"
    device: "cuda"
```
</details>

This configuration defines a complete pipeline for speech-to-text transcription:

1. First, an audio file is read using the AudioReaderSoundfile template
2. The audio is then processed by the ParakeetTDTInference template, which transcribes it to text
3. The transcription is saved to a text file using the TextWriter template

> [!IMPORTANT]
> The AudioReaderSoundfile and TextWriter templates correspond to [sinapsis-data-readers](https://github.com/Sinapsis-AI/sinapsis-data-tools/tree/main/packages/sinapsis_data_readers). If you want to use the example, please make sure you install these packages.
>

To run the config, use the CLI:
```bash
sinapsis run name_of_config.yml
```

<h2 id="webapp">🌐 Webapp</h2>
The webapp included in this project showcases the capabilities of the Parakeet TDT model for speech recognition tasks.

> [!IMPORTANT]
> To run the app you first need to clone this repository:

```bash
git clone git@github.com:Sinapsis-ai/sinapsis-speech.git
cd sinapsis-speech
```

> [!NOTE]
> If you'd like to enable external app sharing in Gradio, `export GRADIO_SHARE_APP=True`

<details>
<summary id="docker"><strong><span style="font-size: 1.4em;">🐳 Docker</span></strong></summary>

**IMPORTANT** This docker image depends on the sinapsis-nvidia:base image. Please refer to the official [sinapsis](https://github.com/Sinapsis-ai/sinapsis?tab=readme-ov-file#docker) instructions to Build with Docker.

1. **Build the sinapsis-speech image**:
```bash
docker compose -f docker/compose.yaml build
```

2. **Start the app container**:
```bash
docker compose -f docker/compose_apps.yaml up -d sinapsis-parakeet-tdt
```

3. **Check the logs**
```bash
docker logs -f sinapsis-parakeet-tdt
```

4. **The logs will display the URL to access the webapp, e.g.,:**:
```bash
Running on local URL:  http://127.0.0.1:7860
```

**NOTE**: The url may be different, check the output of logs.

5. **To stop the app**:
```bash
docker compose -f docker/compose_apps.yaml down
```
</details>

<details>
<summary id="virtual-environment"><strong><span style="font-size: 1.4em;">💻 UV</span></strong></summary>

To run the webapp using the <code>uv</code> package manager, follow these steps:

1. **Sync the virtual environment**:

```bash
uv sync --frozen
```
2. **Install the wheel**:

```bash
uv pip install sinapsis-speech[all] --extra-index-url https://pypi.sinapsis.tech
```
3. **Run the webapp**:

```bash
uv run webapps/speech_to_text_apps/parakeet_tdt_app.py
```
4. **The terminal will display the URL to access the webapp (e.g.)**:
```bash
Running on local URL:  http://127.0.0.1:7860
```
**NOTE**: The URL may vary; check the terminal output for the correct address.

</details>

<h2 id="documentation">📙 Documentation</h2>

Documentation is available on the [sinapsis website](https://docs.sinapsis.tech/docs)

Tutorials for different projects within sinapsis are available at [sinapsis tutorials page](https://docs.sinapsis.tech/tutorials)

<h2 id="license">🔍 License</h2>

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the [LICENSE](LICENSE) file.

For commercial use, please refer to our [official Sinapsis website](https://sinapsis.tech) for information on obtaining a commercial license.
