Metadata-Version: 2.4
Name: realtime-whisper
Version: 0.1.0
Summary: Async Python client for OpenAI Realtime transcription with microphone input, streamed transcript deltas, speech boundary events, and WebSocket lifecycle management.
Requires-Python: >=3.14
Requires-Dist: openai>=2.38.0
Requires-Dist: pydantic>=2.13.4
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: websockets>=16.0
Provides-Extra: audio
Requires-Dist: sounddevice>=0.5.1; extra == 'audio'
Description-Content-Type: text/markdown

# realtime-whisper

Small async Python client for OpenAI Realtime transcription with microphone input,
streamed transcript deltas, completed transcript events, speech boundary events, and
manual buffer flush support.

## Requirements

- Python 3.14 or newer
- An OpenAI API key, or Azure OpenAI Realtime credentials
- A working audio input device when using the default microphone input

## Installation

**With uv** (recommended):

```bash
uv add realtime-whisper

# Include microphone support
uv add "realtime-whisper[audio]"
```

**With pip**:

```bash
pip install realtime-whisper

# Include microphone support
pip install "realtime-whisper[audio]"
```

**From source** (this repository):

```bash
uv sync --extra audio
# or
pip install -e ".[audio]"
```

The `audio` extra installs `sounddevice`, which is required by the default
`MicrophoneInput`. If you provide your own audio input implementation, the base
dependencies are enough.

Set your OpenAI API key before running the examples:

```bash
export OPENAI_API_KEY="your-api-key"
```

On PowerShell:

```powershell
$env:OPENAI_API_KEY = "your-api-key"
```

## Quick Start

```python
import asyncio

from realtime_whisper import RealtimeTranscriber, TranscriptCompleted, TranscriptDelta


async def main() -> None:
	transcriber = RealtimeTranscriber(language="en")

	async for event in transcriber.stream():
		match event:
			case TranscriptDelta(delta=delta):
				print(delta, end="", flush=True)
			case TranscriptCompleted(transcript=transcript):
				print(f"\n>>> {transcript}\n")


asyncio.run(main())
```

Run the included examples:

```bash
# Continuous transcription
uv run python -m examples.transcribe_console

# Push-to-talk (press Enter to flush the buffer)
uv run python -m examples.transcribe_push_to_talk
```

## API Overview

### RealtimeTranscriber

Basic streaming — reads from your default microphone and prints every delta and
completed transcript segment:

```python
import asyncio

from realtime_whisper import (
    NoiseReduction,
    RealtimeTranscriber,
    TranscriptionDelay,
    TranscriptCompleted,
    TranscriptDelta,
)


async def main() -> None:
    transcriber = RealtimeTranscriber(
        language="en",                             # BCP-47 tag, or None for auto-detect
        delay=TranscriptionDelay.MEDIUM,           # latency vs. completeness trade-off
        noise_reduction=NoiseReduction.FAR_FIELD,  # FAR_FIELD or NEAR_FIELD
        include_logprobs=False,                    # True → per-token log-probabilities
    )

    async for event in transcriber.stream():
        match event:
            case TranscriptDelta(delta=delta):
                print(delta, end="", flush=True)
            case TranscriptCompleted(transcript=transcript):
                print(f"\n>>> {transcript}\n")


asyncio.run(main())
```

> See [`examples/transcribe_console.py`](examples/transcribe_console.py) for the
> full runnable version of this pattern.

**Push-to-talk** — call `flush()` to commit the audio buffer and trigger
transcription on demand (e.g. when the user releases a key):

```python
import asyncio

from realtime_whisper import RealtimeTranscriber, TranscriptCompleted, TranscriptDelta


async def read_enter_loop(transcriber: RealtimeTranscriber) -> None:
    loop = asyncio.get_running_loop()
    while True:
        await loop.run_in_executor(None, input)  # blocks until Enter is pressed
        await transcriber.flush()


async def main() -> None:
    transcriber = RealtimeTranscriber(language="en")
    asyncio.create_task(read_enter_loop(transcriber))

    async for event in transcriber.stream():
        match event:
            case TranscriptDelta(delta=delta):
                print(delta, end="", flush=True)
            case TranscriptCompleted(transcript=transcript):
                print(f"\n>>> {transcript}\n")


asyncio.run(main())
```

> See [`examples/transcribe_push_to_talk.py`](examples/transcribe_push_to_talk.py)
> for the full runnable version of this pattern.

**As an async context manager** — `stop()` is called automatically on exit:

````python
async with RealtimeTranscriber(language="en") as transcriber:
    async for event in transcriber.stream():
        ...

### Events

The public event types are exported from `realtime_whisper`:

- `SessionConnected`
- `TranscriptDelta`
- `TranscriptCompleted`
- `SpeechStarted`
- `SpeechStopped`
- `TranscriberError`

### Options

Use `TranscriptionDelay` to control latency versus completeness:

- `TranscriptionDelay.MINIMAL`
- `TranscriptionDelay.LOW`
- `TranscriptionDelay.MEDIUM`
- `TranscriptionDelay.HIGH`
- `TranscriptionDelay.XHIGH`

Use `NoiseReduction` for input noise reduction:

- `NoiseReduction.NEAR_FIELD`
- `NoiseReduction.FAR_FIELD`

Example:

```python
from realtime_whisper import NoiseReduction, RealtimeTranscriber, TranscriptionDelay

transcriber = RealtimeTranscriber(
	language="de",
	delay=TranscriptionDelay.LOW,
	noise_reduction=NoiseReduction.NEAR_FIELD,
)
````

## Providers

By default, `RealtimeTranscriber` uses `OpenAIProvider` and reads
`OPENAI_API_KEY` from the environment. You can also pass `api_key` directly:

```python
transcriber = RealtimeTranscriber(api_key="your-api-key")
```

For Azure OpenAI, pass an `AzureOpenAIProvider`:

```python
from realtime_whisper import AzureOpenAIProvider, RealtimeTranscriber

provider = AzureOpenAIProvider(
	resource="my-resource",
	deployment="my-realtime-deployment",
	api_key="my-api-key",
)

transcriber = RealtimeTranscriber(provider=provider)
```

The Azure provider can also read these environment variables:

- `AZURE_OPENAI_RESOURCE`
- `AZURE_OPENAI_DEPLOYMENT`
- `AZURE_OPENAI_API_KEY`

## Custom Audio Input

Pass an object implementing `AudioInputDevice` to use a custom audio source.
Audio chunks must be raw 24 kHz mono PCM bytes unless you also change the session
settings in the package internals.

```python
from collections.abc import AsyncIterator

from realtime_whisper.audio import AudioInputDevice


class MyAudioInput(AudioInputDevice):
	async def start(self) -> None:
		...

	async def stop(self) -> None:
		...

	async def stream_chunks(self) -> AsyncIterator[bytes]:
		...

	@property
	def is_active(self) -> bool:
		...
```

## Development

```bash
uv sync --extra audio --group dev
uv run ruff check .
uv run pytest
```
