Metadata-Version: 2.4
Name: omniscribe
Version: 0.0.1
Summary: SoberMind Offline Session Transcriber with Speaker Diarization
Requires-Python: >=3.7
Description-Content-Type: text/markdown

# Omniscribe

An offline-first, private speech-to-text tool utilizing **OpenAI's Whisper** models for local transcription, with optional **PyAnnote.audio** integration for multi-speaker diarization (speaker separation).

---

## 1. System Requirements & Setup

This script runs completely locally on your machine, ensuring absolute confidentiality.

### Step A: Install FFMPEG
The transcription backend requires `ffmpeg` to process audio files:
*   **Windows:** Download ffmpeg via chocolatey (`choco install ffmpeg`) or from the official website, and add its `bin` directory to your system `PATH`.
*   **macOS:** `brew install ffmpeg`
*   **Linux:** `sudo apt install ffmpeg`

### Step B: Install Omniscribe
You can install `omniscribe` directly from PyPI:
```bash
pip install omniscribe
```

---

## 2. Multi-Speaker Diarization (Optional)
To separate speakers (e.g. distinguishing between `Speaker 0` and `Speaker 1`):
1.  Install the diarization dependencies:
    ```bash
    pip install pyannote.audio
    ```
2.  Go to Hugging Face and accept the user agreements for these models (requires creating a free account):
    *   [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
    *   [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
3.  Generate a User Access Token (Read Permission) on your [Hugging Face Settings Page](https://huggingface.co/settings/tokens).

---

## 3. Usage Reference

### Standard Transcription (No Speaker Separation)
Runs fully offline immediately:
```bash
omniscribe path/to/session.mp3
```

### Transcribe with Multi-Speaker Diarization
Splits conversation segments by speaker automatically:
```bash
omniscribe path/to/session.mp3 --hf-token "YOUR_HF_TOKEN"
```

### Options
*   `--model`: Footprint of model to load (`tiny`, `base`, `small`, `medium`, `large`). Defaults to `base`, which balances speed and accuracy on standard laptops.
*   `--output`: Specify base output name.

Outputs are generated in both:
*   `.md`: A structured Markdown dialogue format.
*   `.txt`: A timestamped plaintext dialogue transcript.

---

## 4. Web-Based GUI Dashboard

For a premium, interactive editing experience, you can launch the local GUI server:

```bash
python gui_server.py [port]
```

*   **Default Port:** `8080`
*   **Local Address:** `http://localhost:8080`

### GUI Features:
1.  **Drag-and-Drop Form:** Easily input your audio target file, Hugging Face Token, and select Whisper model sizes dynamically.
2.  **Live Console Log:** Watch the terminal status updates and model downloads inside a scrollable screen.
3.  **Dialogue Workspace:**
    *   Edit transcribed text blocks on the fly.
    *   **Speaker Renamer:** Rename default speaker codes (e.g. `SPEAKER_00` to `Me`, `SPEAKER_01` to `Dr. Jameson`) and instantly replace them across the entire dialogue history.
    *   **Export Controls:** One-click copy formatted Markdown dialogue logs or download local JSON objects.
