Metadata-Version: 2.4
Name: subtutu
Version: 0.1.0
Summary: Generate SRT subtitles from video/audio files using Whisper
Author-email: Akshay Gupta <hi@akshaygpt.com>
License: MIT
Project-URL: Homepage, https://github.com/akshaygpt/subtutu
Keywords: subtitles,srt,whisper,transcription,video
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Environment :: Console
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: faster-whisper>=1.0.0
Requires-Dist: certifi
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"

# subtutu

**subtutu** is a command-line tool that automatically generates SRT subtitle files from any video or audio file. It uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) — a high-performance reimplementation of OpenAI Whisper — to transcribe spoken audio into accurate, timestamped subtitles up to **4x faster** than the original Whisper on CPU.

No API key required. Everything runs locally on your machine.

```bash
subtutu lecture.mp4
# → lecture.srt
```

---

## Who is this for?

- Content creators who want subtitles for YouTube videos, reels, or podcasts
- Developers building subtitle pipelines
- Researchers transcribing interviews or recordings
- Anyone who needs fast, offline, accurate subtitles from a video file

---

## Features

- Generates standard `.srt` subtitle files ready for use in any video editor or player
- Powered by faster-whisper (CTranslate2) — **4x faster than openai-whisper on CPU**, up to 12x on GPU
- Shows real-time transcription progress as segments are decoded
- Auto-selects the best model for your hardware (RAM + VRAM aware)
- Shows estimated processing time and accuracy for each model before starting
- Supports 99+ languages with automatic language detection
- Handles MP4, MOV, MKV, AVI, MP3, WAV, M4A, and any format ffmpeg can read
- Clear error messages for common problems (missing ffmpeg, no audio track, silent video, etc.)

---

## Requirements

- Python 3.9+
- [ffmpeg](https://ffmpeg.org/) — required for audio decoding

Install ffmpeg on macOS:
```bash
brew install ffmpeg
```

Install ffmpeg on Ubuntu/Debian:
```bash
sudo apt install ffmpeg
```

---

## Installation

```bash
pip install subtutu
```

No separate PyTorch install needed — subtutu uses CTranslate2 for inference.

---

## Usage

```bash
subtutu <video_or_audio_file> [options]
```

The subtitle file is written to the same directory as the input file by default. If a `.srt` already exists, a new file is created automatically (`video_1.srt`, `video_2.srt`, etc.).

```bash
subtutu video.mp4
# Output: video.srt
```

### Options

| Flag | Default | Description |
|------|---------|-------------|
| `--model` | `auto` | Model: `tiny`, `base`, `small`, `medium`, `large-v3`, `turbo`, or `auto` to pick based on hardware |
| `--language` | `en` | Language code (e.g. `en`, `fr`, `de`, `ja`, `zh`). Use `auto` to detect automatically |
| `--output` | alongside input | Output `.srt` path or directory |
| `--device` | auto | Force compute device: `cpu` or `cuda` |

### Examples

```bash
# Subtitle an English video (default)
subtutu interview.mp4

# Use a more accurate model
subtutu documentary.mp4 --model medium

# Auto-detect the spoken language
subtutu foreign_film.mp4 --language auto

# Subtitle a French video
subtutu podcast.mp3 --language fr

# Save the subtitle file to a specific location
subtutu recording.mov --output ~/Desktop/recording.srt
```

---

## Choosing a model

When `--model auto` is used (the default), subtutu checks your available RAM and GPU memory, then shows a table like this before loading anything:

```
  Model        Accuracy    Est. time
  ────────────   ────────   ──────────
   tiny            60%         1m 2s
   base            75%         2m 5s
▶  small           85%         5m 33s
   medium          93%        16m 40s
   turbo           90%         4m 10s
   large-v3        97%        33m 20s

  Recommended: small
  Press Enter to use 'small', or type a model name:
```

Press Enter to accept, or type a different model name to switch.

| Model | Size (int8) | CPU Speed | Accuracy |
|-------|-------------|-----------|----------|
| `tiny` | ~75 MB | ~120x real-time | 60% |
| `base` | ~145 MB | ~60x real-time | 75% |
| `small` | ~490 MB | ~24x real-time | 85% |
| `medium` | ~1.5 GB | ~8x real-time | 93% |
| `turbo` | ~810 MB | ~30x real-time | 90% |
| `large-v3` | ~3 GB | ~4x real-time | 97% |

Models are downloaded on first use and cached in `~/.cache/huggingface/hub/`.

---

## Supported file formats

Any format that ffmpeg can decode, including:

`mp4` `mov` `mkv` `avi` `webm` `flv` `m4v` `mp3` `wav` `m4a` `aac` `ogg` `flac` `wma`

---

## Troubleshooting

**`ffmpeg not found`**
Install ffmpeg — see Requirements above.

**`No speech detected`**
Try `--language auto` if the video is not in English. Check that the video actually has an audio track.

**`Not enough memory to load the model`**
Switch to a smaller model: `--model small` or `--model tiny`.

**`Permission denied` reading a file on macOS**
Terminal may need Full Disk Access: System Settings > Privacy & Security > Full Disk Access.

---

## License

MIT

---

## Acknowledgements

Built on [faster-whisper](https://github.com/SYSTRAN/faster-whisper) by SYSTRAN. Whisper models by [OpenAI](https://github.com/openai/whisper). Audio decoding by [ffmpeg](https://ffmpeg.org/).
