Metadata-Version: 2.4
Name: asub
Version: 1.0.1
Summary: Generate and translate subtitles from audio/video files using Whisper.
Author: asub contributors
License-Expression: MIT
Project-URL: Homepage, https://github.com/simoneraffaelli/subtitle-generator
Project-URL: Repository, https://github.com/simoneraffaelli/subtitle-generator
Project-URL: Issues, https://github.com/simoneraffaelli/subtitle-generator/issues
Keywords: subtitles,whisper,transcription,translation,srt,vtt
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: End Users/Desktop
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Multimedia :: Video
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: faster-whisper<2.0.0,>=1.0.0
Requires-Dist: deep-translator<2.0.0,>=1.11.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: pyinstaller>=6.0; extra == "dev"
Dynamic: license-file

# asub

Generate and translate subtitles from audio or video files — one by one or in folders — powered by
[faster-whisper](https://github.com/SYSTRAN/faster-whisper) and
[deep-translator](https://github.com/nidhaloff/deep-translator).

## Features

- **Fast transcription** — up to 4× faster than OpenAI Whisper with the same
  accuracy, using CTranslate2.
- **Automatic language detection** — or specify the source language manually.
- **Folder batch processing** — process every supported media file in a folder
  while loading the Whisper model only once.
- **Translation** — translate subtitles to 100+ languages via Google Translate
  (free, no API key).
- **Multiple output formats** — SRT and WebVTT.
- **VAD filtering** — Silero VAD removes silence and reduces hallucination.
- **Model choice** — from `tiny` (fast, less accurate) to `large-v3`
  (slow, most accurate).
- **CPU & GPU** — works on both, with int8 quantisation for low-memory setups.
- **Packagable as .exe** — single-file Windows executable via PyInstaller.

## Installation

### From source (recommended for development)

```bash
git clone https://github.com/simoneraffaelli/subtitle-generator.git
cd subtitle-generator
pip install -e ".[dev]"
```

### From PyPI (once published)

```bash
pip install asub
```

## Quick start

```bash
# Transcribe a video and generate subtitles (auto-detect language)
asub video.mp4

# Process every supported media file in a folder
asub recordings/

# Use a specific model and output format
asub video.mp4 -m large-v3 -f vtt

# Transcribe and translate to Italian
asub video.mp4 -t it

# Batch-process a folder and write all subtitles into one output directory
asub recordings/ -o subtitles/ -t de

# Specify source language, translate to German, verbose output
asub podcast.mp3 -l en -t de -v

# Use CPU with int8 quantisation
asub interview.wav --device cpu --compute-type int8
```

## Folder input

When `input` points to a folder, asub switches to **batch mode**.

- Only the **top level** of the folder is scanned. Nested subfolders are not
  processed.
- Supported input extensions in batch mode are:
  `.aac`, `.aiff`, `.avi`, `.flac`, `.m4a`, `.m4v`, `.mkv`, `.mov`, `.mp3`,
  `.mp4`, `.mpeg`, `.mpg`, `.oga`, `.ogg`, `.opus`, `.wav`, `.webm`, `.wma`.
- Without `-o/--output`, subtitle files are written next to each media file.
- With `-o/--output`, the value is treated as an **output directory**, not a
  single subtitle file path.
- The Whisper model is loaded once and reused across the whole batch.
- If `-l/--language` is omitted, language detection happens **per file**.
  Mixed-language folders are supported, and translation uses each file's
  detected source language.
- If a file's detected language already matches `-t/--translate`, translation
  is skipped for that file.
- If one file fails, asub continues with the rest, then prints a summary. The
  process exits with code `1` if any file failed.
- If multiple input files would produce the same subtitle path (for example
  `clip.mp3` and `clip.wav`), asub stops before processing and asks you to
  resolve the naming collision.

## CLI reference

```
usage: asub [-h] [-o OUTPUT] [-f {srt,vtt}] [-m MODEL] [--device {auto,cpu,cuda}]
                 [--compute-type TYPE] [-l LANG] [--no-vad] [-t LANG] [-v] [--version]
                 [--list-languages]
                 input

positional arguments:
  input                 Path to an audio/video file, or a folder containing media files.

options:
  -o, --output          Output subtitle file path for a single input file, or an output directory when the input is a folder.
  -f, --format          Subtitle format: srt, vtt
  -v, --verbose         Increase verbosity (-v INFO, -vv DEBUG)
  --version             Show version and exit
  --list-languages      Print supported translation languages and exit

transcription:
  -m, --model           Whisper model size (default: medium)
  --device              auto | cpu | cuda (default: auto)
  --compute-type        Quantisation type (auto-selected if omitted)
  -l, --language        Source language code (auto-detected if omitted)
  --no-vad              Disable Voice Activity Detection

translation:
  -t, --translate LANG  Translate subtitles to this language code
```

## Python API

```python
from asub.transcriber import load_model, transcribe
from asub.translator import translate_segments
from asub.subtitle import write_subtitle_file, SubtitleFormat

# 1. Transcribe
model = load_model("medium", device="auto")
result = transcribe(model, "video.mp4")

# 2. Translate (optional)
translated = translate_segments(result.segments, source=result.language, target="it")

# 3. Write subtitle file
write_subtitle_file(translated, "video_it.srt")
```

## Building a Windows .exe

```bash
pip install ".[dev]"
pyinstaller asub.spec
```

The executable will be in `dist/asub.exe`.

> **Note:** The .exe does not bundle Whisper model weights. Models are downloaded
> on first run and cached in the default Hugging Face cache directory.

## Hugging Face token (optional)

On first run, Whisper model weights are downloaded from the Hugging Face Hub.
Without authentication you may see this warning:

> You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN
> to enable higher rate limits and faster downloads

This is **not an error** — the download still works, just at lower rate limits.
To silence the warning and get faster downloads:

1. Create a free account at <https://huggingface.co>.
2. Go to **Settings → Access Tokens** and generate a token.
3. Set the token before running asub:

```bash
# Linux / macOS
export HF_TOKEN="hf_your_token_here"

# Windows PowerShell
$env:HF_TOKEN = "hf_your_token_here"
```

To make this permanent, add the variable to your shell profile or set it via
**System → Environment Variables** on Windows.

## Available models

| Model            | Parameters | Relative speed | VRAM   |
| ---------------- | ---------- | -------------- | ------ |
| `tiny`           | 39 M       | ~10×           | ~1 GB  |
| `base`           | 74 M       | ~7×            | ~1 GB  |
| `small`          | 244 M      | ~4×            | ~2 GB  |
| `medium`         | 769 M      | ~2×            | ~5 GB  |
| `large-v3`       | 1550 M     | 1×             | ~10 GB |
| `turbo`          | 809 M      | ~8×            | ~6 GB  |
| `distil-large-v3`| 756 M      | ~6×            | ~6 GB  |

### Choosing the right model

Not every model is the best choice for every situation. Here's a breakdown to
help you pick:

- **`tiny`** — Fastest model by far. Good for quick previews or testing your
  pipeline. Accuracy is noticeably lower, especially on non-English audio or
  noisy recordings. Use it when speed matters more than quality.
- **`base`** — A small step up from `tiny`. Slightly more accurate, still very
  fast. Suitable for clear speech in common languages.
- **`small`** — A solid mid-range option. Handles most languages well and runs
  comfortably on CPU. Good balance for everyday use when you don't have a GPU.
- **`medium`** — The default. Significantly more accurate than `small`,
  especially for accented speech, niche languages, and overlapping speakers.
  Slower on CPU, but a great choice with a GPU.
- **`large-v3`** — The most accurate model. Best for professional-quality
  subtitles, rare languages, or heavily accented audio. Requires a CUDA GPU
  with at least 10 GB VRAM for practical use.
- **`turbo`** — Near `large-v3` accuracy at roughly 8× the speed. This is the
  best "quality per second" option if you have a GPU with ≥6 GB VRAM.
- **`distil-large-v3`** — A distilled version of `large-v3`. Similar accuracy
  on English, slightly worse on other languages. Fast and memory-efficient.
  Best for English-heavy workloads on a GPU.

### Recommended commands

**Fastest result** — use `tiny` when you just need a rough draft quickly:

```bash
asub video.mp4 -m tiny
```

**Best result** — use `large-v3` (GPU required) for maximum accuracy:

```bash
asub video.mp4 -m large-v3
```

**Best compromise** — use `turbo` on GPU for near-best accuracy at high speed,
or `small` on CPU for a good quality-to-speed ratio:

```bash
# With a CUDA GPU (recommended)
asub video.mp4 -m turbo

# CPU only
asub video.mp4 -m small
```

> **Tip:** The device and compute type are auto-detected. If you have a CUDA
> GPU, asub will use it with `float16` automatically. On CPU it falls back
> to `int8` quantisation.

## Batch-mode notes

- Batch mode is **sequential** by design. This keeps GPU/CPU memory use stable
  and makes per-file progress easier to understand.
- In mixed-language folders, auto-detection may produce different source
  languages across files. If you need consistent source-language handling, pass
  `-l/--language` explicitly.
- Translation uses Google Translate through `deep-translator`, so large batches
  can still hit network or rate-limit issues. Failures are reported per file in
  the final summary.

## Upgrading dependencies

```bash
pip install --upgrade faster-whisper deep-translator
```

## Contributing

1. Fork the repo and create a feature branch.
2. Install dev dependencies: `pip install -e ".[dev]"`
3. Run tests: `python -m pytest`
4. Lint: `ruff check src/ tests/`
5. Open a pull request.

## License

[MIT](LICENSE)

## Acknowledgements

Built with the great help of [Claude Opus 4.6](https://www.anthropic.com/) by Anthropic.
