Metadata-Version: 2.4
Name: wyoming-whisper.cpp
Version: 2.6.1
Summary: Wyoming Server for whisper.cpp
Author-email: Laurent Debacker <deback@gmail.com>
License: MIT
Project-URL: Homepage, http://github.com/debackerl/wyoming-whisper.cpp
Keywords: wyoming,whisper,stt
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: End Users/Desktop
Classifier: Topic :: Text Processing :: Linguistic
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8.1
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: wyoming>=1.5.3
Requires-Dist: pywhispercpp<2,>=1.3.3
Requires-Dist: soxr<1,>=0.3.1
Provides-Extra: dev
Requires-Dist: build==1.3.0; extra == "dev"
Requires-Dist: twine==6.1.0; extra == "dev"
Requires-Dist: black==22.12.0; extra == "dev"
Requires-Dist: flake8==6.0.0; extra == "dev"
Requires-Dist: isort==5.11.3; extra == "dev"
Requires-Dist: mypy==0.991; extra == "dev"
Requires-Dist: pylint==2.15.9; extra == "dev"
Requires-Dist: pytest==7.4.4; extra == "dev"
Requires-Dist: pytest-asyncio==0.23.3; extra == "dev"
Dynamic: license-file

# Wyoming Whisper.cpp

[Wyoming protocol](https://github.com/rhasspy/wyoming) server for the [whisper.cpp](https://github.com/ggml-org/whisper.cpp) speech to text system.

This project was based on wyoming-faster-whisper. I wanted to adopt whisper.cpp instead to allow more backends. In particular, since mid 2025, the Vulkan
backend improved significantly, and offers excellent performance on all modern GPUs without installing much dependencies. You should try it for yourself,
and decide what's most performant for your hardware.

While not yet integrated, whisper.cpp has experimental support to detect turns of speakers. This could allow us to only return the transcript of the first
speaker, assuming that what follows is background chat. Or we may try extracting the timestamp of each turn of speaker to split the audio into segments,
and run a diarization model to keep only segments matching the first speaker.

## Local Install

Clone the repository and set up Python virtual environment:

``` sh
git clone https://github.com/debackerl/wyoming-whisper.cpp.git
cd wyoming-whisper.cpp
script/setup
```

Run a server anyone can connect to:

```sh
script/run --model tiny-int8 --language en --uri 'tcp://0.0.0.0:10300' --data-dir /data --download-dir /data
```

See [available models](https://absadiki.github.io/pywhispercpp/#pywhispercpp.constants.AVAILABLE_MODELS).
