Metadata-Version: 2.4
Name: omi-med-stt
Version: 0.1.8
Summary: CLI for Omi Med STT v1 medical speech-to-text
Author: Omi Health
License: MIT
License-File: LICENSE
License-File: NOTICE.md
Requires-Python: >=3.10
Requires-Dist: huggingface-hub<1.0,>=0.23
Requires-Dist: numpy<2.3,>=1.24
Requires-Dist: soundfile>=0.12
Provides-Extra: cpp
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == 'dev'
Provides-Extra: mlx
Requires-Dist: mlx-audio; extra == 'mlx'
Requires-Dist: parakeet-mlx; extra == 'mlx'
Provides-Extra: nemo
Requires-Dist: nemo-toolkit[asr]; extra == 'nemo'
Description-Content-Type: text/markdown

# Omi Med STT Runtime

Runtime CLI for **Omi Med STT v1**, an English medical speech-to-text model.

This repository contains runtime code only. It does **not** contain model
weights, private benchmark data, or training data.

## Runtimes

`omi-med-stt` supports three runtime paths:

| Runtime | Best for | Artifact |
| --- | --- | --- |
| `mlx` | Apple Silicon Macs | `omi-health/omi-med-stt-v1-mlx` |
| `cpp` | Windows, Mac Intel, Linux CPU, and ggml GPU backends | `omi-health/omi-med-stt-v1-gguf` |
| `nemo` | NVIDIA CUDA servers and canonical NeMo checkpoint use | `omi-health/omi-med-stt-v1` |

The source-of-truth model is the NeMo checkpoint. MLX and GGUF are runtime
exports.

## Install

From PyPI:

```bash
pip install -U omi-med-stt
```

From this repository:

```bash
pip install git+https://github.com/Omi-Health/omi-med-stt-runtime.git
```

For Apple Silicon / MLX:

```bash
pip install "omi-med-stt[mlx] @ git+https://github.com/Omi-Health/omi-med-stt-runtime.git"
```

For CUDA/Linux NeMo:

```bash
pip install "omi-med-stt[nemo] @ git+https://github.com/Omi-Health/omi-med-stt-runtime.git"
```

## Basic Usage

Simple path:

```bash
omi-med-stt audio.wav
```

Explicit MLX:

```bash
omi-med-stt audio.wav --runtime mlx
```

Explicit NeMo:

```bash
omi-med-stt audio.wav --runtime nemo
```

Explicit parakeet.cpp / GGUF:

```bash
omi-med-stt audio.wav --runtime cpp
```

JSON output:

```bash
omi-med-stt audio.wav --json
```

Dependency/runtime check:

```bash
omi-med-stt check
```

## parakeet.cpp / GGUF Runtime

The `cpp` runtime is powered by
[`parakeet.cpp`](https://github.com/mudler/parakeet.cpp), a C++/ggml inference
engine for NVIDIA Parakeet ASR models.

Omi Med STT v1 includes a post-Conformer medical adapter. Until this adapter
extension is upstreamed, `omi-med-stt` builds `parakeet.cpp` with the adapter
patch included in this repository and caches the resulting `parakeet-cli`.

Normal use:

```bash
omi-med-stt audio.wav --runtime cpp
```

Pre-build the C++ runtime explicitly:

```bash
omi-med-stt install-cpp
```

Choose a backend:

```bash
omi-med-stt install-cpp --cpp-backend cpu
omi-med-stt install-cpp --cpp-backend metal
omi-med-stt install-cpp --cpp-backend cuda
```

Manual override remains available for developers:

```bash
omi-med-stt audio.wav --runtime cpp --parakeet-cli /path/to/parakeet-cli
```

The cached runtime is built from `parakeet.cpp` and applies
`parakeet-cpp-omi-adapter.patch`. You need `git` and `cmake` available for the
first build.

The `cpp` runtime downloads only the selected GGUF file. It does not download
the NeMo `.nemo` checkpoint or the MLX `model.safetensors`.

By default, the `cpp` runtime is pinned to a specific Hugging Face repository
revision and verifies the SHA256 checksum for the official f16/q8_0 GGUF files.
You can override this for experiments:

```bash
omi-med-stt audio.wav --runtime cpp --revision main
omi-med-stt audio.wav --runtime cpp --no-verify-checksum
```

## Long Audio

Omi Med STT v1 is based on Parakeet and can handle long-form audio. Start with
the simple path:

```bash
omi-med-stt consult.wav
```

An explicit chunked path is still available for constrained environments:

```bash
omi-med-stt transcribe-long consult.wav --chunk-seconds 25 --overlap 3
```

## Model Access

The runtime defaults to these Hugging Face model repositories:

- `omi-health/omi-med-stt-v1`
- `omi-health/omi-med-stt-v1-mlx`
- `omi-health/omi-med-stt-v1-gguf`

If the model repositories are private before launch, authenticate first:

```bash
huggingface-cli login
```

## Attribution

This runtime uses or interoperates with:

- NVIDIA NeMo / Parakeet, for the base ASR architecture.
- `parakeet-mlx`, for Apple Silicon MLX inference.
- `parakeet.cpp`, for GGUF / C++ / ggml inference.

See [NOTICE.md](NOTICE.md).

## License

Runtime code in this repository is MIT licensed.

Model weights are governed separately by the model repositories. Omi Med STT v1
is a derivative of `nvidia/parakeet-tdt-0.6b-v2`, whose model weights are
licensed under CC-BY-4.0.
