Metadata-Version: 2.4
Name: pipecat-moonshine
Version: 0.1.0
Summary: Moonshine ASR (speech-to-text) community integration for Pipecat
Author: pipecat-moonshine contributors
License: BSD-2-Clause
Project-URL: Homepage, https://github.com/ubopod/pipecat-moonshine
Project-URL: Source, https://github.com/ubopod/pipecat-moonshine
Project-URL: Issues, https://github.com/ubopod/pipecat-moonshine/issues
Project-URL: Changelog, https://github.com/ubopod/pipecat-moonshine/blob/main/CHANGELOG.md
Keywords: pipecat,moonshine,asr,stt,speech-to-text,voice,ai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pipecat-ai[websockets-base]>=1.2.1
Requires-Dist: useful-moonshine-onnx>=20251121
Requires-Dist: numpy<3,>=1.26.4
Requires-Dist: loguru<1,>=0.7.0
Provides-Extra: examples
Requires-Dist: python-dotenv<2,>=1.0.0; extra == "examples"
Requires-Dist: pyaudio~=0.2.14; extra == "examples"
Provides-Extra: dev
Requires-Dist: ruff<1,>=0.12.11; extra == "dev"
Requires-Dist: pytest<10,>=9.0.0; extra == "dev"
Requires-Dist: pytest-asyncio<2,>=1.0.0; extra == "dev"
Dynamic: license-file

# pipecat-moonshine

[Moonshine ASR](https://github.com/moonshine-ai/moonshine) speech-to-text
integration for [Pipecat](https://github.com/pipecat-ai/pipecat).

Moonshine is a family of small, fast automatic-speech-recognition models
optimized for resource-constrained devices. The Tiny English model is roughly
26 M parameters, the Base English model roughly 58 M — both run on CPU via
ONNX Runtime with no GPU required. That makes Moonshine an attractive choice
for low-latency, on-device transcription in Pipecat pipelines that already
handle VAD upstream.

This package provides `MoonshineSTTService`, a `SegmentedSTTService`
subclass that plugs straight into any Pipecat pipeline.

## Status

Community-maintained integration. See
[Pipecat's Community Integrations guide](https://github.com/pipecat-ai/pipecat/blob/main/COMMUNITY_INTEGRATIONS.md)
for what that means — in short, the Pipecat team does not maintain or
support this package; please file issues here.

**Tested with Pipecat v1.2.1.**

## Installation

```bash
pip install pipecat-moonshine
```

This pulls in [`useful-moonshine-onnx`](https://pypi.org/project/useful-moonshine-onnx/)
and `pipecat-ai` automatically. The first time you instantiate the service,
the chosen model weights are downloaded from Hugging Face
(`UsefulSensors/moonshine`) and cached locally.

For the included foundational example you also need the local-audio extras:

```bash
pip install 'pipecat-moonshine[examples]'
```

## Usage

```python
from pipecat.pipeline.pipeline import Pipeline
from pipecat_moonshine import MoonshineSTTService, Model

stt = MoonshineSTTService(model=Model.TINY_EN)

pipeline = Pipeline([
    transport.input(),
    vad_processor,     # MUST run upstream of the STT — see below
    stt,
    # ... downstream processors
])
```

`MoonshineSTTService` subclasses `SegmentedSTTService`, so a VAD-driven
processor (e.g. `VADProcessor` with `SileroVADAnalyzer`) must produce
`VADUserStartedSpeakingFrame` / `VADUserStoppedSpeakingFrame` upstream of
it. Each detected speech segment is decoded into a single final
`TranscriptionFrame` — Moonshine does not emit interim results.

### Configuring the model at runtime

Pass an explicit model or a fully-built settings object:

```python
# Convenience kwarg
stt = MoonshineSTTService(model=Model.BASE_EN)

# Or via Settings (e.g. when you want to update at runtime)
stt = MoonshineSTTService(
    settings=MoonshineSTTService.Settings(model="moonshine/base"),
)
```

## Running the example

```bash
git clone https://github.com/ubopod/pipecat-moonshine
cd pipecat-moonshine
pip install -e '.[examples]'
python examples/transcription-moonshine.py
```

Speak into your default mic; lines like `Transcription: hello world` will be
printed for each detected utterance.

## Audio format requirements

Moonshine expects **16 kHz, mono, 16-bit signed PCM** input. Pipecat's
default `LocalAudioTransport` and most WebRTC transports already provide
this. If your pipeline runs at a different sample rate the service will log
a warning on the first segment and transcription quality may degrade — add
a resampler upstream if you need a different rate.

Moonshine also enforces a per-segment duration window: speech segments
shorter than 0.1 s or longer than 64 s are silently dropped (the service
logs at `DEBUG` level when this happens).

## Supported models

| Constant         | Model name         | Params | Notes                                |
| ---------------- | ------------------ | ------ | ------------------------------------ |
| `Model.TINY_EN`  | `moonshine/tiny`   | 26 M   | English-only, MIT-licensed weights.  |
| `Model.BASE_EN`  | `moonshine/base`   | 58 M   | English-only, MIT-licensed weights.  |

### Multilingual models — important license note

Moonshine also publishes multilingual checkpoints (Spanish, Japanese,
Arabic, Korean, Mandarin, Vietnamese, Ukrainian, …). Those weights are
released under the **Moonshine Community License**, which is *non-commercial*.

For that reason they are intentionally **not** enumerated in the `Model`
enum. If you want to use one you must:

1. Read and accept the upstream Moonshine Community License.
2. Pass the model name as a string explicitly, e.g.:

   ```python
   stt = MoonshineSTTService(model="moonshine/base")
   # then load the appropriate language checkpoint via your own download flow
   ```

This package does not bundle, mirror, or auto-download non-commercial
weights, and the maintainers make no representation that doing so complies
with your use case.

## Frames

| In                                  | Out                                                        |
| ----------------------------------- | ---------------------------------------------------------- |
| `VADUserStartedSpeakingFrame`       | (no output — buffers audio internally)                     |
| `VADUserStoppedSpeakingFrame`       | one `TranscriptionFrame` per segment (final), or nothing   |
| Any non-VAD audio                   | buffered/forwarded according to `SegmentedSTTService`      |

Errors during transcription are pushed as `ErrorFrame`s; the pipeline is
not torn down so other services can continue.

## Metrics

`can_generate_metrics()` returns `True`. Per-segment processing time is
recorded via `start_processing_metrics` / `stop_processing_metrics`, so
enabling metrics on your `PipelineTask` will surface Moonshine latency
alongside the rest of your pipeline.

## Maintainer

Community-maintained. Not affiliated with Moonshine AI or Daily.

## License

BSD-2-Clause — see [LICENSE](./LICENSE). Note that the Moonshine model
*weights* are governed by their own license (MIT for English models,
Moonshine Community License for others) — see the section above.

## Versioning and changelog

See [CHANGELOG.md](./CHANGELOG.md). This package follows semantic
versioning.

## Getting help

- Pipecat Discord: <https://discord.gg/pipecat> (`#community-integrations`)
- Pipecat changelog (track upstream changes that may affect this integration):
  <https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md>
- Issues for this integration: file them in this repo.
