Metadata-Version: 2.4
Name: omi-med-stt
Version: 0.1.12
Summary: CLI for Omi Med STT v1 medical speech-to-text
Author: Omi Health
License: MIT
License-File: LICENSE
License-File: NOTICE.md
Requires-Python: >=3.10
Requires-Dist: huggingface-hub<2.0,>=0.23
Requires-Dist: numpy<2.3,>=1.24
Requires-Dist: soundfile>=0.12
Provides-Extra: cpp
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Provides-Extra: mlx
Requires-Dist: mlx-audio; extra == 'mlx'
Requires-Dist: parakeet-mlx; extra == 'mlx'
Provides-Extra: nemo
Requires-Dist: nemo-toolkit[asr]; extra == 'nemo'
Description-Content-Type: text/markdown

# Omi Med STT Runtime

Runtime CLI for **Omi Med STT v1**, an English medical speech-to-text model.

This repository contains runtime code only. It does **not** contain model
weights, private benchmark data, or training data.

## Runtimes

`omi-med-stt` supports three runtime paths:

| Runtime | Best for | Artifact |
| --- | --- | --- |
| `mlx` | Apple Silicon Macs | `omi-health/omi-med-stt-v1-mlx` |
| `cpp` | Linux and Windows CPU fallback | `omi-health/omi-med-stt-v1-gguf` |
| `nemo` | NVIDIA CUDA servers and canonical NeMo checkpoint use | `omi-health/omi-med-stt-v1` |

The source-of-truth model is the NeMo checkpoint. MLX and GGUF are runtime
exports.

## Install

From PyPI:

```bash
pip install -U omi-med-stt
```

From this repository:

```bash
pip install git+https://github.com/Omi-Health/omi-med-stt-runtime.git
```

For Apple Silicon / MLX:

```bash
pip install "omi-med-stt[mlx] @ git+https://github.com/Omi-Health/omi-med-stt-runtime.git"
```

For CUDA/Linux NeMo:

```bash
pip install "omi-med-stt[nemo] @ git+https://github.com/Omi-Health/omi-med-stt-runtime.git"
```

## Basic Usage

Simple path:

```bash
omi-med-stt audio.wav
```

Explicit MLX:

```bash
omi-med-stt audio.wav --runtime mlx
```

Explicit NeMo:

```bash
omi-med-stt audio.wav --runtime nemo
```

Explicit parakeet.cpp / GGUF:

```bash
omi-med-stt audio.wav --runtime cpp
```

JSON output:

```bash
omi-med-stt audio.wav --json
```

Dependency/runtime check:

```bash
omi-med-stt check
```

## parakeet.cpp / GGUF Runtime

The `cpp` runtime is powered by
[`parakeet.cpp`](https://github.com/mudler/parakeet.cpp), a C++/ggml inference
engine for NVIDIA Parakeet ASR models.

Omi Med STT v1 includes a post-Conformer medical adapter. Until this adapter
extension is upstreamed, `omi-med-stt` builds `parakeet.cpp` with the adapter
patch included in this repository and caches the resulting `libparakeet` shared
library plus `parakeet-cli`.

The default `cpp` path uses the `parakeet.cpp` C API directly. For long audio it
keeps the model loaded once and sends in-memory 16 kHz PCM chunks to
`libparakeet`, avoiding repeated model setup and temporary chunk WAV reads.
`parakeet-cli` remains available as a fallback and for developer debugging.

Normal use:

```bash
omi-med-stt audio.wav --runtime cpp
```

Windows CPU is a first-class target. Install once, then transcribe:

```powershell
pip install -U omi-med-stt
omi-med-stt install-cpp --cpp-backend cpu
omi-med-stt audio.wav --runtime cpp --cpp-backend cpu
```

On Windows CPU the runtime defaults to a capped thread count
`min(os.cpu_count(), 8)` instead of blindly using the upstream engine default.
This is more stable on small 4-vCPU machines and can be overridden:

```powershell
omi-med-stt audio.wav --runtime cpp --cpp-backend cpu --cpp-threads 4
$Env:OMI_MED_STT_CPP_THREADS = "4"
omi-med-stt audio.wav --runtime cpp --cpp-backend cpu
```

Pre-install the C++ runtime explicitly:

```bash
omi-med-stt install-cpp
```

Choose a backend:

```bash
omi-med-stt install-cpp --cpp-backend cpu
```

Manual override remains available for developers:

```bash
omi-med-stt audio.wav --runtime cpp --parakeet-cli /path/to/parakeet-cli
```

Runtime toggles:

```bash
# force the old subprocess path
OMI_MED_STT_CPP_DISABLE_CAPI=1 omi-med-stt audio.wav --runtime cpp

# require C API and fail instead of falling back
OMI_MED_STT_CPP_REQUIRE_CAPI=1 omi-med-stt audio.wav --runtime cpp

# portable build instead of native CPU flags
OMI_MED_STT_GGML_NATIVE=OFF omi-med-stt install-cpp --force --cpp-backend cpu
```

For Linux and Windows CPU, `install-cpp` first downloads a pinned prebuilt
runtime bundle from the Omi Med STT runtime GitHub release. If no bundle is
available for the host, it falls back to building from `parakeet.cpp` with
`parakeet-cpp-omi-adapter.patch`. The source-build fallback requires `git` and
`cmake`. On Windows, normal use requires only the shared `parakeet.dll` runtime;
it does not require a separate `parakeet-cli.exe`.

The `cpp` runtime downloads only the selected GGUF file. It does not download
the NeMo `.nemo` checkpoint or the MLX `model.safetensors`.

By default, the `cpp` runtime is pinned to a specific Hugging Face repository
revision and verifies the SHA256 checksum for the official f16/q8_0 GGUF files.
You can override this for experiments:

```bash
omi-med-stt audio.wav --runtime cpp --revision main
omi-med-stt audio.wav --runtime cpp --no-verify-checksum
```

## Long Audio

Omi Med STT v1 is based on Parakeet and can handle long-form audio. Start with
the simple path:

```bash
omi-med-stt consult.wav
```

An explicit chunked path is still available for constrained environments:

```bash
omi-med-stt transcribe-long consult.wav --chunk-seconds 25 --overlap 3
```

## Model Access

The runtime defaults to these Hugging Face model repositories:

- `omi-health/omi-med-stt-v1`
- `omi-health/omi-med-stt-v1-mlx`
- `omi-health/omi-med-stt-v1-gguf`

If the model repositories are private before launch, authenticate first:

```bash
huggingface-cli login
```

## Pre-Publish Checks

Use these before pushing runtime changes to GitHub or publishing a new PyPI
version. The checked-in unit tests do not download models and do not require
audio files:

```bash
pip install -e ".[dev]"
python scripts/prepublish_check.py --skip-build
```

For real model smoke tests, put a small permitted audio file under
`local_smoke/` or pass any private local path. `local_smoke/` and audio/model
artifacts are git-ignored.

Apple Silicon Mac:

```bash
python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime mlx --skip-build
```

Linux with NVIDIA GPU:

```bash
python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime nemo --skip-build
python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime cpp:cuda --skip-build
```

Linux CPU:

```bash
python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime cpp:cpu --skip-build
```

Windows CPU, from PowerShell:

```powershell
py -m scripts.prepublish_check --audio local_smoke\sample.wav --runtime cpp:cpu --skip-build
py -m scripts.prepublish_check --audio local_smoke\consult.wav --runtime cpp:cpu --long --skip-build
```

Long-audio smoke, using the runtime chunk path:

```bash
python scripts/prepublish_check.py --audio local_smoke/consult.wav --runtime cpp:cpu --long --skip-build
```

Windows NPU is currently an explicit expected skip. There is no NPU backend for
Omi Med STT yet; use `cpp` CPU on Windows, MLX on Apple Silicon, or NeMo on a
CUDA Linux machine.

The script writes the latest local report to
`local_smoke/prepublish_last.json`.

## Attribution

This runtime uses or interoperates with:

- NVIDIA NeMo / Parakeet, for the base ASR architecture.
- `parakeet-mlx`, for Apple Silicon MLX inference.
- `parakeet.cpp`, for GGUF / C++ / ggml inference.

See [NOTICE.md](NOTICE.md).

## License

Runtime code in this repository is MIT licensed.

Model weights are governed separately by the model repositories. Omi Med STT v1
is a derivative of `nvidia/parakeet-tdt-0.6b-v2`, whose model weights are
licensed under CC-BY-4.0.
