Metadata-Version: 2.4
Name: audium-md
Version: 0.1.3
Summary: Audio-to-Markdown transcription optimized for AI consumption
Author: tamukj
License: MIT
Requires-Python: >=3.10
Requires-Dist: click>=8.1.0
Requires-Dist: faster-whisper>=1.2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <a href="https://github.com/Tamukj/Audium">
    <img src="assets/logo.svg" width="180" alt="Audium logo">
  </a>
</p>

<h1 align="center">Audium</h1>

<p align="center">
  <strong>🎧 Audio → AI‑optimized Markdown</strong>
  <br>
  <sub>Transcribe MP3/WAV/FLAC into clean, token‑efficient Markdown — ready for any LLM.</sub>
</p>

<p align="center">
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10%2B-blue?style=flat&logo=python&logoColor=white" alt="Python 3.10+"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green?style=flat" alt="MIT License"></a>
  <a href="https://pypi.org/project/audium-md/"><img src="https://img.shields.io/badge/pypi-v0.1.0-blue?style=flat&logo=pypi&logoColor=white" alt="PyPI version"></a>
  <a href="https://github.com/SYSTRAN/faster-whisper"><img src="https://img.shields.io/badge/backend-faster--whisper-8A2BE2?style=flat" alt="faster-whisper"></a>
  <a href="https://github.com/Tamukj/Audium"><img src="https://img.shields.io/badge/platform-linux%20%7C%20macOS%20%7C%20windows-lightgrey?style=flat" alt="Platform"></a>
</p>

<p align="center">
  <a href="README.md">English</a> ·
  <a href="README.ru.md">Русский</a> ·
  <a href="README.zh-CN.md">中文</a>
</p>

---

<h2 align="center">✨ Why Audium?</h2>

Feed audio to an LLM. Get answers. Simple.

But raw transcripts burn tokens on noise: long timestamps, filler words,
silent segments, markup that adds nothing.

Audium turns speech into **the minimum viable Markdown**: every character
counts, nothing wasted.

<div align="center">

| 🎯 | ⚡ | 🪙 | 👁️ | 🌍 |
|---|---|---|---|---|
| **3 formats** | **GPU‑accelerated** | **Token‑aware** | **Watch mode** | **~97 languages** |
| compact, minimal, structured | 2–10× real‑time on CUDA | `[MM:SS]` + VAD + filler‑strip | drop files → auto‑transcribe | tiny to large‑v3 |

</div>

---

<h2 align="center">📦 Install</h2>

**Requires `ffmpeg`:** `sudo apt install ffmpeg` / `brew install ffmpeg`

### Recommended: pipx (isolated, no conflicts)

```bash
pipx install audium-md
```
> `pipx` creates its own virtual environment — works on Ubuntu/Debian without PEP 668 errors.
> Install pipx first: `sudo apt install pipx` or `python3 -m pip install --user pipx`

### Alternative: uv tool (fastest)

```bash
uv tool install audium-md
```

### Fallback: pip with override

```bash
pip install audium-md --break-system-packages
```

### Local development

```bash
git clone https://github.com/Tamukj/Audium.git
cd Audium
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
```

---

<h2 align="center">🚀 Quick Start</h2>

```bash
# Process a folder
audium run ./my-recordings/

# Single file
audium run lecture.mp3

# Watch folder — auto‑transcribe new files
audium watch ./incoming/

# See what you've transcribed
audium list

# Change model
audium config set model large-v3
```

---

<h2 align="center">📝 Formats</h2>

### compact *(default)*

```
# lecture.mp3 (01:23:45)

[00:00] Neural networks learn hierarchical representations
[00:04] Each layer detects increasingly abstract features
[00:08] Early layers find edges and textures
[00:12] Later layers detect objects and scenes
```

### minimal

```
Neural networks learn hierarchical representations
Each layer detects increasingly abstract features
Early layers find edges and textures
Later layers detect objects and scenes
```

### structured *(requires speaker diarization)*

```
# interview.mp3 (00:45:12)

## Alice [00:00-00:30]
Neural networks are a powerful tool. It's important to understand their limitations.

## Bob [00:30-01:15]
I completely agree. Let me walk through an example to make this concrete.
```

---

<h2 align="center">⚙️ Commands</h2>

| Command | Description |
|---------|-------------|
| `audium run <path>` | Transcribe audio files or folders |
| `audium watch <path>` | Watch folder and auto‑process new files |
| `audium list [dir]` | Show processed transcripts with file sizes |
| `audium config` | Show current configuration |
| `audium config set <key> <value>` | Change a setting |
| `audium config reset` | Reset to factory defaults |
| `audium config path` | Show config file location |

### Common flags for `run` and `watch`

| Flag | Default | Description |
|------|---------|-------------|
| `-o, --output-dir` | `./transcripts` | Where to save .md files |
| `-f, --format` | `compact` | `compact` / `minimal` / `structured` |
| `-r, --recursive` | off | Search subdirectories |
| `--model` | `small` | `tiny` / `base` / `small` / `medium` / `large-v3` |
| `--language` | `auto` | Force language code: `ru`, `en`, `zh`, ... |
| `--strip-fillers` | off | Remove "um", "uh", "like", "мм", "ээ", etc. |
| `--no-vad` | off | Disable voice activity detection |
| `--no-progress` | off | Hide the progress bar |

---

<h2 align="center">🔧 Configuration</h2>

Settings are merged: **CLI flags > `.audium.yaml` (project) > `~/.config/audium/config.yaml` > defaults**

```bash
# Show current config
audium config

# Set a value
audium config set model large-v3
audium config set strip_fillers true
audium config set output_dir ~/Documents/transcripts

# Also works as a shorthand:
echo "audium config model large-v3" → now supported!

# Reset to factory defaults
audium config reset

# Show config file path
audium config path
```

### All Settings

```bash
audium config
```

Output showing current values with accepted options in parentheses:

```
  beam_size: 5             (integer 1-20)
  compute_type: auto       (auto, float16, int8_float16, int8)
  device: cuda             (cuda, cpu)
  format: compact          (compact, minimal, structured)
  language: auto           (e.g. auto, ru, en, zh, ...)
  min_segment_duration: 0.0  (float, seconds)
  model: small             (tiny, base, small, medium, large-v3, turbo)
  output_dir: ./transcripts  (path)
  recursive: false         (true / false)
  strip_fillers: false     (true / false)
  vad_filter: true         (true / false)
```

### Setting Reference

| Key | Default | Description | Options |
|-----|---------|-------------|---------|
| `model` | `small` | Whisper model size | tiny, base, small, medium, large-v3, turbo |
| `device` | `auto` | Computation device (auto-detect) | auto, cuda, cpu |
| `compute_type` | `auto` | Precision for GPU inference | auto, float16, int8_float16, int8 |
| `format` | `compact` | Output Markdown format | compact, minimal, structured |
| `language` | `auto` | Source language | auto, or any ISO code (ru, en, zh, ...) |
| `beam_size` | `5` | Beam search width | integer (1-20) |
| `output_dir` | `./transcripts` | Where .md files are saved | any path |
| `strip_fillers` | `false` | Remove filler words | true / false |
| `vad_filter` | `true` | Voice Activity Detection | true / false |
| `min_segment_duration` | `0.0` | Skip segments shorter than N seconds | float |
| `recursive` | `false` | Scan subdirectories | true / false |

> **compute_type auto-detection:** On GPUs with compute capability ≥ 7.0 (Volta+), `float16` is used for best performance. On Pascal GPUs (GTX 10xx), `int8_float16` is used. On CPU, `int8` is used.

### Local config file

Create `.audium.yaml` in your project root to override defaults per-project:

```yaml
model: medium
language: ru
format: minimal
output_dir: ./transcripts
```

### First run — model download

On first use, Audium downloads the Whisper model from HuggingFace Hub (~500 MB for `small`).  
The model is **cached locally** in `~/.cache/huggingface/hub/` — subsequent runs are instant and fully offline.

**Why HuggingFace?** The models are too large (~500 MB–3 GB) to bundle in a pip package or GitHub repo. They're downloaded once, then cached forever.

**Is this legal?** Yes. All components are MIT licensed: [Whisper (OpenAI)](https://github.com/openai/whisper/blob/main/LICENSE), [faster-whisper](https://github.com/SYSTRAN/faster-whisper/blob/master/LICENSE), [CTranslate2](https://github.com/OpenNMT/CTranslate2/blob/master/LICENSE). Free for personal and commercial use.

---

<h2 align="center">🪙 Token Optimization</h2>

Audium is built to minimize LLM token cost:

| Technique | Savings |
|-----------|---------|
| `[MM:SS]` instead of `[HH:MM:SS.mmm]` | ~30% on timestamps |
| VAD filtering (skip silence) | 15–40% on meeting recordings |
| Filler‑word stripping | 5–10% on conversational speech |
| `min_segment_duration` threshold | skip noise fragments |
| One line per segment, no blank lines | ~8% vs paragraph output |

---

<h2 align="center">📊 Model Sizes</h2>

| Model | Parameters | Speed (GPU) | Best for |
|-------|-----------|-------------|----------|
| tiny | 39M | ~32× real‑time | Quick drafts, low‑resource |
| base | 74M | ~16× real‑time | Dictation, clean audio |
| small | 244M | ~6× real‑time | **General purpose** |
| medium | 769M | ~2× real‑time | Accents, noisy audio |
| large‑v3 | 1.5B | ~1× real‑time | Maximum accuracy |

> All multilingual models support the same ~97 languages. The size trades accuracy for speed.

---

<h2 align="center">🖥️ GPU Support</h2>

Audium automatically detects your GPU and configures itself:

| Hardware | Detection | Backend |
|----------|-----------|---------|
| **NVIDIA** (all) | `nvidia-smi` | CUDA — best performance |
| **AMD** (ROCm) | `/dev/kfd` + `rocm-smi` | ROCm / HIP |
| **Intel** (Arc, Iris) | `xpu-smi` / drm | oneAPI / SYCL |
| **CPU only** | fallback | int8 quantized |

No manual configuration needed. Run `audium run ./audio/` and it just works.

---

### Updating

```bash
pip install --upgrade audium-md
# or: pipx upgrade audium-md
# or: uv tool upgrade audium-md
```

Check your current version:
```bash
audium --version
```

---

<h2 align="center">📄 License</h2>

<p align="center">
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=for-the-badge" alt="MIT License"></a>
</p>

<p align="center">
  MIT — do whatever you want. Attribution appreciated.
</p>
