Metadata-Version: 2.4
Name: audio2sub
Version: 0.1.0
Summary: Transcribe media files to SRT subtitles.
Home-page: https://github.com/Xavier-Lam/audio2sub
Author: Xavier-Lam
Author-email: xavierlam7@hotmail.com
License: MIT
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.1.0
Requires-Dist: torchaudio>=2.1.0
Requires-Dist: ffmpeg-python>=0.2.0
Requires-Dist: pysrt>=1.1.2
Requires-Dist: tqdm
Requires-Dist: onnxruntime<2,>=1.14
Requires-Dist: numpy
Provides-Extra: faster-whisper
Requires-Dist: faster-whisper>=1.0.1; extra == "faster-whisper"
Provides-Extra: whisper
Requires-Dist: openai-whisper>=20231117; extra == "whisper"
Provides-Extra: gemini
Requires-Dist: google-genai>=1.0.0; extra == "gemini"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: openai-whisper>=20231117; extra == "dev"
Requires-Dist: faster-whisper>=1.0.1; extra == "dev"
Requires-Dist: google-genai>=1.0.0; extra == "dev"
Provides-Extra: all
Requires-Dist: openai-whisper>=20231117; extra == "all"
Requires-Dist: faster-whisper>=1.0.1; extra == "all"
Requires-Dist: google-genai>=1.0.0; extra == "all"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Audio2Sub

**Audio2Sub** is a command-line tool that automatically transcribes audio from video or audio files and generates subtitles in the `.srt` format. It uses FFmpeg for media handling, [Silero VAD](https://github.com/snakers4/silero-vad) for precise voice activity detection, and supports multiple transcription backends to convert speech to text.

## Installation

Before installing, you must have [FFmpeg](https://ffmpeg.org/download.html) installed and available in your system's PATH.

You can install Audio2Sub using `pip`. The default installation includes the `faster_whisper` backend.

```bash
pip install audio2sub[faster_whisper]
```

To install with a different backend, see the table in the [Backends](#Backends) section below.

## Usage
### Basic Example

```bash
audio2sub my_video.mp4 -o my_video.srt --lang en
```

This command will transcribe the audio from `my_video.mp4` into English and save the subtitles to `my_video.srt`.

**Notes:**
*   **First-Time Use**: The first time you run the program, it will download the necessary transcription models. This may take some time and require significant disk space.
*   **CUDA**: Performance significantly degraded without CUDA when using whisper-based local models. The program will raise a warning if CUDA is not available when it starts. If your system has a compatible GPU, install the [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive) first. If you are sure CUDA has been installed correctly and still get the warning, you may need to [reinstall a compatible PyTorch version manually](https://pytorch.org/get-started/locally/). The reinstallation of PyTorch may break other dependencies if you choose a different version than what you currently have. In this case, you may need to reinstall those according to the warnings shown.

### Using a Different Transcriber

Use the `-t` or `--transcriber` flag to select a different backend.

```bash
audio2sub my_audio.wav -o my_audio.srt --lang en -t whisper --model medium
```

Each transcriber has its own options. To see them, use `--help` with the transcriber specified.

```bash
audio2sub -t faster_whisper --help
```

## Backends

Audio2Sub supports the following transcription backends.

| Backend Name      | Description |
| --- | --- |
| `faster_whisper` | A faster reimplementation of Whisper using CTranslate2. See [Faster Whisper](https://github.com/guillaumekln/faster-whisper). This is the default backend. |
| `whisper`        | The original speech recognition model by OpenAI. See [OpenAI Whisper](https://github.com/openai/whisper). |
| `gemini`         | Google's Gemini model via their API. Requires a `GEMINI_API_KEY` environment variable or `--gemini-api-key` argument.|

You should use `pip install audio2sub[<backend>]` to install the desired backend support and use the corresponding transcriber with the `-t` flag.

## Contributing

Contributions are welcome! Please open an issue or submit a pull request on the GitHub repository.
