Metadata-Version: 2.4
Name: audium-md
Version: 0.1.0
Summary: Audio-to-Markdown transcription optimized for AI consumption
Author: tamukj
License: MIT
Requires-Python: >=3.10
Requires-Dist: click>=8.1.0
Requires-Dist: faster-whisper>=1.2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <a href="https://github.com/Tamukj/Audium">
    <img src="assets/logo.svg" width="180" alt="Audium logo">
  </a>
</p>

<h1 align="center">Audium</h1>

<p align="center">
  <strong>🎧 Audio → AI‑optimized Markdown</strong>
  <br>
  <sub>Transcribe MP3/WAV/FLAC into clean, token‑efficient Markdown — ready for any LLM.</sub>
</p>

<p align="center">
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10%2B-blue?style=flat&logo=python&logoColor=white" alt="Python 3.10+"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green?style=flat" alt="MIT License"></a>
  <a href="https://pypi.org/project/audium-md/"><img src="https://img.shields.io/badge/pypi-v0.1.0-blue?style=flat&logo=pypi&logoColor=white" alt="PyPI version"></a>
  <a href="https://github.com/SYSTRAN/faster-whisper"><img src="https://img.shields.io/badge/backend-faster--whisper-8A2BE2?style=flat" alt="faster-whisper"></a>
  <a href="https://github.com/Tamukj/Audium"><img src="https://img.shields.io/badge/platform-linux%20%7C%20macOS%20%7C%20windows-lightgrey?style=flat" alt="Platform"></a>
</p>

<p align="center">
  <a href="README.md">English</a> ·
  <a href="README.ru.md">Русский</a> ·
  <a href="README.zh-CN.md">中文</a>
</p>

---

<h2 align="center">✨ Why Audium?</h2>

Feed audio to an LLM. Get answers. Simple.

But raw transcripts burn tokens on noise: long timestamps, filler words,
silent segments, markup that adds nothing.

Audium turns speech into **the minimum viable Markdown**: every character
counts, nothing wasted.

<div align="center">

| 🎯 | ⚡ | 🪙 | 👁️ | 🌍 |
|---|---|---|---|---|
| **3 formats** | **GPU‑accelerated** | **Token‑aware** | **Watch mode** | **~97 languages** |
| compact, minimal, structured | 2–10× real‑time on CUDA | `[MM:SS]` + VAD + filler‑strip | drop files → auto‑transcribe | tiny to large‑v3 |

</div>

---

<h2 align="center">📦 Install</h2>

```bash
pip install audium-md
```

> Requires `ffmpeg` on your system: `sudo apt install ffmpeg` / `brew install ffmpeg`

---

<h2 align="center">🚀 Quick Start</h2>

```bash
# Process a folder
audium run ./my-recordings/

# Single file
audium run lecture.mp3

# Watch folder — auto‑transcribe new files
audium watch ./incoming/

# See what you've transcribed
audium list

# Change model
audium config set model large-v3
```

---

<h2 align="center">📝 Formats</h2>

### compact *(default)*

```
# lecture.mp3 (01:23:45)

[00:00] Neural networks learn hierarchical representations
[00:04] Each layer detects increasingly abstract features
[00:08] Early layers find edges and textures
[00:12] Later layers detect objects and scenes
```

### minimal

```
Neural networks learn hierarchical representations
Each layer detects increasingly abstract features
Early layers find edges and textures
Later layers detect objects and scenes
```

### structured *(requires speaker diarization)*

```
# interview.mp3 (00:45:12)

## Alice [00:00-00:30]
Neural networks are a powerful tool. It's important to understand their limitations.

## Bob [00:30-01:15]
I completely agree. Let me walk through an example to make this concrete.
```

---

<h2 align="center">⚙️ Commands</h2>

| Command | Description |
|---------|-------------|
| `audium run <path>` | Transcribe audio files or folders |
| `audium watch <path>` | Watch folder and auto‑process new files |
| `audium list [dir]` | Show processed transcripts with file sizes |
| `audium config` | Show current configuration |
| `audium config set <key> <value>` | Change a setting |
| `audium config reset` | Reset to factory defaults |
| `audium config path` | Show config file location |

### Common flags for `run` and `watch`

| Flag | Default | Description |
|------|---------|-------------|
| `-o, --output-dir` | `./transcripts` | Where to save .md files |
| `-f, --format` | `compact` | `compact` / `minimal` / `structured` |
| `-r, --recursive` | off | Search subdirectories |
| `--model` | `small` | `tiny` / `base` / `small` / `medium` / `large-v3` |
| `--language` | `auto` | Force language code: `ru`, `en`, `zh`, ... |
| `--strip-fillers` | off | Remove "um", "uh", "like", "мм", "ээ", etc. |
| `--no-vad` | off | Disable voice activity detection |
| `--no-progress` | off | Hide the progress bar |

---

<h2 align="center">🔧 Configuration</h2>

Settings are merged: **CLI flags > `.audium.yaml` (project) > `~/.config/audium/config.yaml` > defaults**

```bash
# Set default model
audium config set model large-v3

# Always strip filler words
audium config set strip_fillers true

# Custom output folder
audium config set output_dir ~/Documents/transcripts

# See what you changed
audium config
```

```yaml
# Example .audium.yaml (place in project root)
model: medium
language: ru
format: minimal
output_dir: ./transcripts
```

---

<h2 align="center">🪙 Token Optimization</h2>

Audium is built to minimize LLM token cost:

| Technique | Savings |
|-----------|---------|
| `[MM:SS]` instead of `[HH:MM:SS.mmm]` | ~30% on timestamps |
| VAD filtering (skip silence) | 15–40% on meeting recordings |
| Filler‑word stripping | 5–10% on conversational speech |
| `min_segment_duration` threshold | skip noise fragments |
| One line per segment, no blank lines | ~8% vs paragraph output |

---

<h2 align="center">📊 Model Sizes</h2>

| Model | Parameters | Speed (GPU) | Best for |
|-------|-----------|-------------|----------|
| tiny | 39M | ~32× real‑time | Quick drafts, low‑resource |
| base | 74M | ~16× real‑time | Dictation, clean audio |
| small | 244M | ~6× real‑time | **General purpose** |
| medium | 769M | ~2× real‑time | Accents, noisy audio |
| large‑v3 | 1.5B | ~1× real‑time | Maximum accuracy |

> All multilingual models support the same ~97 languages. The size trades accuracy for speed.

---

<h2 align="center">📄 License</h2>

<p align="center">
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=for-the-badge" alt="MIT License"></a>
</p>

<p align="center">
  MIT — do whatever you want. Attribution appreciated.
</p>
