Metadata-Version: 2.4
Name: audio-transcribe-cli
Version: 0.1.0
Summary: Local CLI for audio/video transcription using Soniox API — generates SRT subtitles with an HTML viewer, no server required
Project-URL: Homepage, https://github.com/sun-asterisk-research/audio-transcribe-cli
Project-URL: Repository, https://github.com/sun-asterisk-research/audio-transcribe-cli
Project-URL: Issues, https://github.com/sun-asterisk-research/audio-transcribe-cli/issues
Author-email: Sun Asterisk <pham.van.toan@sun-asterisk.com>
License: MIT
Keywords: audio,cli,soniox,speech-to-text,srt,subtitles,transcription,video
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Utilities
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27
Requires-Dist: pypdf>=4.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: rich>=13.0
Requires-Dist: typer>=0.12
Requires-Dist: websockets>=13.0
Description-Content-Type: text/markdown

# audio-transcribe-cli

Local CLI for audio/video transcription using the [Soniox](https://soniox.com) API. Generates SRT subtitles and a self-contained HTML editor — no server, no database, no cloud storage required.

## Prerequisites

- Python 3.11+
- [ffmpeg](https://ffmpeg.org/download.html) available on `PATH`
- A [Soniox API key](https://console.soniox.com)

## Install

```bash
cd audio-transcribe-cli
pip install .
```

For an editable / development install:

```bash
pip install -e .
```

## Environment Setup

```bash
cp .env.example .env
# Edit .env and set SONIOX_API_KEY
```

Or export directly:

```bash
export SONIOX_API_KEY=your_key_here
```

## Usage

### `transcribe` — Transcribe an audio or video file

```bash
atcli transcribe path/to/recording.mp4
```

With reference PDF documents (used to build Soniox context for better accuracy):

```bash
atcli transcribe recording.mp4 --context slides.pdf --context notes.pdf
```

With free-text hints and explicit language hints:

```bash
atcli transcribe recording.mp4 --hints "Technical meeting about Kubernetes" --language en,ja
```

Specify output path and skip the HTML viewer:

```bash
atcli transcribe recording.mp4 --output /tmp/output.srt --no-view
```

Use LLM-powered keyword extraction from reference docs:

```bash
atcli transcribe recording.mp4 \
  --context reference.pdf \
  --llm-api-key $OPENAI_API_KEY \
  --llm-endpoint https://api.openai.com/v1
```

**Options:**

| Flag | Default | Description |
|------|---------|-------------|
| `--context / -c` | — | PDF reference doc(s); repeatable |
| `--hints` | `""` | Free-text added to Soniox context |
| `--language` | auto | Comma-separated language hints (`en,ja,vi`) |
| `--output / -o` | `<input>.srt` | SRT output path |
| `--view / --no-view` | `--view` | Open HTML viewer after transcription |
| `--api-key` | `$SONIOX_API_KEY` | Soniox API key |
| `--llm-api-key` | `$OPENAI_API_KEY` | LLM API key for keyword extraction |
| `--llm-endpoint` | `$OPENAI_ENDPOINT` | LLM endpoint URL |

### `view` — Open an existing SRT in the HTML viewer

```bash
atcli view path/to/subtitles.srt
```

This generates `subtitles_viewer.html` next to the SRT and opens it in your browser. Use the file picker in the viewer to load the corresponding media file.

## HTML Viewer Features

- Two-panel layout: editable subtitle table (left) + media player (right)
- Click any row to seek the player to that position
- Current subtitle highlighted during playback with on-video overlay
- Editable index, start/end timing, and text fields
- Add / delete subtitle rows
- Export SRT button — downloads the current edited state as a `.srt` file
- Fully self-contained (no internet required, opens via `file://`)

## How It Works

1. Extracts audio stream if input is a video file (ffmpeg, no re-encoding)
2. Builds a Soniox context dict from PDF reference docs (keyword extraction)
3. Splits audio into 30–60 s segments at silence points
4. Uploads and transcribes each segment via the Soniox async API
5. Merges all tokens with correct time offsets, groups into subtitle blocks
6. Resolves `<split:N>` tags for long sentences into separate cues
7. Writes the final SRT file
8. Generates a self-contained HTML viewer and opens it in your browser

## Project Structure

```
src/atcli/
├── cli.py                 # Typer entry point
├── core/
│   ├── soniox_client.py   # Soniox HTTP client (upload, transcribe, cleanup)
│   ├── audio_segmenter.py # ffmpeg-based silence-aware segmentation
│   ├── context_extractor.py  # PDF → keywords → Soniox context dict
│   ├── token_assembler.py # Soniox tokens → subtitle blocks
│   ├── subtitle_generator.py # blocks → SRT/VTT text + split-tag resolver
│   └── html_viewer.py     # Self-contained HTML editor generator
└── utils/
    └── logger.py          # Rich-based logger
```
