Metadata-Version: 2.4
Name: converse-framework
Version: 0.2.2
Summary: Provider-agnostic speech stack for speech-to-speech applications
License: MIT
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: numpy>=2.0
Provides-Extra: all
Requires-Dist: faster-whisper>=1.2; extra == 'all'
Requires-Dist: httpx>=0.28; extra == 'all'
Requires-Dist: kokoro-onnx>=0.5; (python_version < '3.14') and extra == 'all'
Requires-Dist: misaki>=0.7; extra == 'all'
Requires-Dist: nvidia-cublas-cu12; (platform_system == 'Windows') and extra == 'all'
Requires-Dist: onnxruntime>=1.20; extra == 'all'
Requires-Dist: pocket-tts>=2.1; extra == 'all'
Requires-Dist: silero-vad>=6.0; extra == 'all'
Provides-Extra: all-asr
Requires-Dist: faster-whisper>=1.2; extra == 'all-asr'
Requires-Dist: httpx>=0.28; extra == 'all-asr'
Requires-Dist: nvidia-cublas-cu12; (platform_system == 'Windows') and extra == 'all-asr'
Provides-Extra: all-llm
Requires-Dist: httpx>=0.28; extra == 'all-llm'
Provides-Extra: all-tts
Requires-Dist: kokoro-onnx>=0.5; (python_version < '3.14') and extra == 'all-tts'
Requires-Dist: misaki>=0.7; extra == 'all-tts'
Requires-Dist: pocket-tts>=2.1; extra == 'all-tts'
Provides-Extra: all-vad
Requires-Dist: onnxruntime>=1.20; extra == 'all-vad'
Requires-Dist: silero-vad>=6.0; extra == 'all-vad'
Provides-Extra: faster-whisper
Requires-Dist: faster-whisper>=1.2; extra == 'faster-whisper'
Requires-Dist: nvidia-cublas-cu12; (platform_system == 'Windows') and extra == 'faster-whisper'
Provides-Extra: kokoro
Requires-Dist: kokoro-onnx>=0.5; (python_version < '3.14') and extra == 'kokoro'
Requires-Dist: misaki>=0.7; extra == 'kokoro'
Provides-Extra: llamacpp
Requires-Dist: httpx>=0.28; extra == 'llamacpp'
Provides-Extra: pocket-tts
Requires-Dist: pocket-tts>=2.1; extra == 'pocket-tts'
Provides-Extra: silero
Requires-Dist: onnxruntime>=1.20; extra == 'silero'
Requires-Dist: silero-vad>=6.0; extra == 'silero'
Provides-Extra: whisper-cpp
Requires-Dist: httpx>=0.28; extra == 'whisper-cpp'
Description-Content-Type: text/markdown

# Converse Framework

Provider-agnostic speech stack for speech-to-speech applications.

## Table of Contents

- [Install](#install)
  - [Missing dependency behavior](#missing-dependency-behavior)
  - [Python version compatibility](#python-version-compatibility)
- [Quick Start](#quick-start)
  - [Provider status semantics](#provider-status-semantics)
- [Recipes](#recipes)
  - [Minimal mock text pipeline](#minimal-mock-text-pipeline)
  - [Audio frame to utterance collector to pipeline](#audio-frame-to-utterance-collector-to-pipeline)
  - [Custom provider registration](#custom-provider-registration)
  - [Custom event sink](#custom-event-sink)
  - [Browser playback](#browser-playback-js-reference-client)
  - [Browser microphone capture](#browser-microphone-capture-js-reference-client)
  - [Mobile browser microphone testing](#mobile-browser-microphone-testing)
  - [Wrap an external CLI as a provider](#wrap-an-external-cli-as-a-provider)
  - [Pocket TTS voice listing and configuration](#pocket-tts-voice-listing-and-configuration)
  - [CUDA DLL helper](#cuda-dll-helper-windows)
- [Runtime Provider Updates](#runtime-provider-updates)
  - [ProviderBundle.replace()](#providerbundlereplace)
  - [ProviderBundle.unload_replaced()](#providerbundleunload_replaced)
  - [SpeechPipeline.update_providers()](#speechpipelineupdate_providers)
  - [AudioUtteranceCollector.update_vad_provider()](#audioutterancecollectorupdate_vad_provider)
  - [End-to-end pattern](#end-to-end-pattern)
- [WebSocket Session Helper](#websocket-session-helper)
- [Examples](#examples)
  - [Text chat](#text-chat-automated-test-covered)
  - [Voice chat](#voice-chat-manual)
- [Framework / App Boundary](#framework--app-boundary)
  - [Transport boundary](#transport-boundary)
- [Status](#status)

## Install

```bash
pip install converse-framework
```

The base install pulls in only `numpy`. Real VAD / ASR / LLM / TTS
providers live behind optional extras:

```bash
pip install converse-framework[silero]          # Silero VAD
pip install converse-framework[faster-whisper]  # faster-whisper ASR
pip install converse-framework[whisper-cpp]     # whisper.cpp HTTP ASR
pip install converse-framework[llamacpp]        # llama.cpp HTTP LLM
pip install converse-framework[kokoro]          # Kokoro ONNX TTS
pip install converse-framework[pocket-tts]      # Pocket TTS
pip install converse-framework[all]             # everything
```

### Missing dependency behavior

If a config requests a provider whose heavy backend is not installed,
`build_provider` (and therefore `build_provider_bundle`) returns an
`UnavailableProvider` sentinel for that slot instead of raising a bare
`ImportError`. The sentinel's `status.message` always names the provider
that was missing and includes the `pip install` extra to fix it. The
mapping is owned by `converse_framework.providers.unavailable.EXTRA_HINTS`
and exposed as `extra_hint_for(kind, name)`, which returns the extra
name (e.g. `"converse-framework[silero]"`) when one is known and `None`
otherwise.

```python
from converse_framework import extra_hint_for
from converse_framework.providers.unavailable import UnavailableProvider

print(extra_hint_for("vad", "silero"))          # converse-framework[silero]
print(extra_hint_for("asr", "faster-whisper"))  # converse-framework[faster-whisper]
print(extra_hint_for("vad", "made-up"))         # None

p = UnavailableProvider("vad", "silero")
print(p.status.message)
# Provider 'silero' (vad) is not available. Install the required extra
# with `pip install converse-framework[silero]`.
```

`is_provider_available(kind, name)` is the companion check: it returns
`True` only when the provider's heavy dependency is importable, so you
can fail fast before handing the config to a pipeline. `UnavailableProvider`
is a real implementation of all four provider protocols, so the rest of
the pipeline keeps running (turns fail with a clear `RuntimeError` when
the broken provider is actually invoked) and the consumer can decide
whether to prompt for the install or fall back to a different provider.

### Python version compatibility

The base package supports Python 3.11 and newer. Each extra has its
own constraints (the table below mirrors the markers in
`pyproject.toml`):

| Extra | Python | Notes |
|---|---|---|
| base | 3.11+ | `numpy>=2.0` is the only required runtime dependency. |
| `silero` | 3.11+ | `silero-vad` + `onnxruntime`. No known upper bound. |
| `faster-whisper` | 3.11+ | The `nvidia-cublas-cu12` wheel pins Windows. |
| `llamacpp` | 3.11+ | `httpx` itself supports 3.9+, so 3.11+ is the only constraint. |
| `whisper-cpp` | 3.11+ | Only needs `httpx`, which supports 3.9+. |
| `kokoro` | 3.11 to <3.14 | `kokoro-onnx` 0.5.0 requires Python <3.14. The wheel build fails fast on 3.14+. |
| `pocket-tts` | 3.11+ | No known upper bound. |

The `kokoro` extra is the only one with an upper-bound marker today.
If you are on Python 3.14+ and need a TTS provider, use `pocket-tts`
or a mock provider. New providers should add their own
`python_version` markers in `pyproject.toml` when their backend has a
known limit.

## Quick Start

```python
from converse_framework import build_provider_bundle

config = {
    "vad": {"provider": "mock"},
    "asr": {"provider": "mock"},
    "llm": {"provider": "mock"},
    "tts": {"provider": "mock"},
}

bundle = build_provider_bundle(config)
print(bundle.statuses())
```

`import converse_framework` only needs `numpy` to be installed — heavy
provider backends are loaded lazily through the registry.

### Provider status semantics

Every provider exposes a ``status`` property (cached state, no I/O), a
lightweight ``probe_status()`` method (import checks, HTTP reachability
— does **not** load models), and a ``load_status()`` method (may load
or initialise heavy resources before returning).

Call ``probe_status()`` to check readiness without side effects — it
is safe for status screens and health checks:

```python
import asyncio

# Probe without loading models
results = asyncio.run(bundle.probe_statuses())
for kind, status in results.items():
    print(f"{kind}: ready={status.ready} level={status.status_level}")
    if status.voices:
        print(f"  voices={[v.id for v in status.voices]}")
```

Call ``load_status()`` when you need the definitive picture — it may
trigger model downloads or initialise GPU resources:

```python
results = asyncio.run(bundle.load_statuses())
```

The ``status_level`` field distinguishes ``"ready"``, ``"configured"``,
``"loading"``, ``"error"``, and ``"unavailable"``. The old
``check_status()`` is kept for backward compatibility and behaves
the same as ``probe_status()`` for providers that implement it.

## Recipes

The recipes below are short, self-contained scripts that exercise the
public API. They all run with the base install (`numpy` + the framework)
unless a snippet is explicitly fenced as `requires the \`<extra>\` extra`.

### Minimal mock text pipeline

`build_provider_bundle` returns a fully-mock provider bundle and
`SpeechPipeline` runs an end-to-end text turn against it. `QueueEventSink`
captures every event the pipeline emits so the script can assert or
print them.

```python
import asyncio

from converse_framework import (
    PipelineConfig,
    QueueEventSink,
    SpeechPipeline,
    build_provider_bundle,
)


async def main():
    queue: asyncio.Queue = asyncio.Queue()
    sink = QueueEventSink(queue)
    pipeline = SpeechPipeline(
        providers=build_provider_bundle(
            {
                "vad": {"provider": "mock"},
                "asr": {"provider": "mock"},
                "llm": {"provider": "mock"},
                "tts": {"provider": "mock"},
            }
        ),
        sink=sink,
        config=PipelineConfig(tts_chunk_chars=80),
    )

    await pipeline.handle_text_turn("Hello, mock pipeline.")
    # Let the TTS streaming task finish, then drain the captured events.
    await asyncio.sleep(0.5)
    types = [queue.get_nowait()["type"] for _ in range(queue.qsize())]
    print(types)


asyncio.run(main())
```

### Audio frame to utterance collector to pipeline

`parse_audio_frame` validates a wire payload and turns it into an
`AudioFrame`. `AudioUtteranceCollector` runs VAD on the frame, applies
the rejection gates, and on `vad.speech_end` hands the assembled PCM
bytes to its `utterance_callback`. The recipe wires that callback into
`SpeechPipeline.handle_audio_turn`. The in-process VAD below fires
`vad.speech_start` on the first frame and `vad.speech_end` on the third
so the collector has something to dispatch — the framework's own
`MockVADProvider` returns no events and is not useful for this path.

```python
import asyncio
import base64

from converse_framework.audio_utils import AudioFrameStats, parse_audio_frame
from converse_framework.events import QueueEventSink
from converse_framework.pipeline import PipelineConfig, SpeechPipeline
from converse_framework.protocols import (
    ProviderCapabilities,
    ProviderStatus,
    VADEvent,
)
from converse_framework.registry import build_provider_bundle
from converse_framework.utterance_collector import (
    AudioUtteranceCollector,
    UtteranceCollectorConfig,
)


class ScriptedVAD:
    """A tiny in-process VAD: start on frame 0, end on frame 2."""

    def __init__(self) -> None:
        self._count = 0

    @property
    def status(self) -> ProviderStatus:
        return ProviderStatus(
            name="scripted",
            kind="vad",
            ready=True,
            message="Scripted VAD fires start at frame 0 and end at frame 2.",
            capabilities=ProviderCapabilities(),
        )

    async def check_status(self) -> ProviderStatus:
        return self.status

    async def process_frame(self, frame):
        self._count += 1
        events: list[VADEvent] = []
        if self._count == 1:
            events.append(VADEvent(type="vad.speech_start", probability=1.0, audio_ms=30))
        if self._count == 3:
            events.append(VADEvent(type="vad.speech_end", probability=1.0, audio_ms=90))
        return events


async def main():
    queue: asyncio.Queue = asyncio.Queue()
    sink = QueueEventSink(queue)
    bundle = build_provider_bundle(
        {
            "vad": {"provider": "mock"},
            "asr": {"provider": "mock"},
            "llm": {"provider": "mock"},
            "tts": {"provider": "mock"},
        }
    )
    pipeline = SpeechPipeline(providers=bundle, sink=sink, config=PipelineConfig(tts_chunk_chars=80))

    cfg = UtteranceCollectorConfig(
        sample_rate=16000,
        channels=1,
        frame_ms=30,
        # Disable the rejection gates -- this recipe shows the wiring
        # from frame to pipeline, not the collector's silence handling.
        min_speech_duration_ms=0,
        reject_low_energy_rms=0,
        reject_utterance_rms=0,
        trim_silence_rms=0,
    )
    stats = AudioFrameStats(
        expected_sample_rate=16000,
        expected_channels=1,
        expected_frame_ms=30,
    )

    async def on_utterance(pcm: bytes, sample_rate: int, mode: str) -> None:
        await pipeline.handle_audio_turn(pcm, sample_rate, mode=mode)

    collector = AudioUtteranceCollector(
        vad_provider=ScriptedVAD(),
        event_sink=sink,
        utterance_callback=on_utterance,
        config=cfg,
    )

    # Three 30 ms frames of silence (16 kHz mono -> 480 samples -> 960 bytes).
    silence = base64.b64encode(b"\x00\x00" * 480).decode("ascii")
    for seq in range(3):
        frame = parse_audio_frame(
            {
                "data": silence,
                "sample_rate": 16000,
                "channels": 1,
                "frame_ms": 30,
                "sequence": seq,
                "encoding": "pcm_s16le",
            },
            stats,
        )
        await collector.ingest_frame(frame)

    await pipeline.cancel_tts("done")
    await asyncio.sleep(0.3)
    types = [queue.get_nowait()["type"] for _ in range(queue.qsize())]
    print(types)


asyncio.run(main())
```

### Custom provider registration

`register_provider` adds a new (kind, name) pair to the registry by
import string. `build_provider_bundle` then resolves the name on demand
and instantiates the class. `is_provider_available` is the companion
probe — it returns `True` only when the underlying module can be
imported, which is the safe check before handing the config to a
pipeline. The recipe points the new name at the framework's own mock
VAD so it runs against the base install; replace the import string
with your own `my_pkg.providers:MyVADProvider` to register a real
implementation.

```python
from converse_framework.registry import (
    build_provider_bundle,
    is_provider_available,
    register_provider,
)

# Register a custom VAD name. Replace the import string with your own
# `my_pkg.providers:MyVADProvider` to wire up a real implementation.
register_provider(
    "vad",
    "my-vad",
    "converse_framework.providers.mock:MockVADProvider",
)

bundle = build_provider_bundle(
    {
        "vad": {"provider": "my-vad"},
        "asr": {"provider": "mock"},
        "llm": {"provider": "mock"},
        "tts": {"provider": "mock"},
    }
)
print(bundle.vad.status.provider_id)        # "mock" (the registered class)
print(is_provider_available("vad", "my-vad"))  # True
```

### Custom event sink

`SpeechPipeline` accepts any `EventSink` subclass. The recipe prints
each event as it fires, which is handy when you are wiring up a new
transport and want to see the wire shape without standing up a queue.

```python
import asyncio

from converse_framework import (
    EventSink,
    PipelineConfig,
    SpeechPipeline,
    build_provider_bundle,
)


class PrintSink(EventSink):
    """Minimal sink that prints each event as it fires."""

    async def emit(self, event_type, **payload):
        keys = ", ".join(payload) or "-"
        print(f"[event] {event_type} ({keys})")


async def main():
    sink = PrintSink()
    pipeline = SpeechPipeline(
        providers=build_provider_bundle(
            {
                "vad": {"provider": "mock"},
                "asr": {"provider": "mock"},
                "llm": {"provider": "mock"},
                "tts": {"provider": "mock"},
            }
        ),
        sink=sink,
        config=PipelineConfig(tts_chunk_chars=80),
    )
    await pipeline.handle_text_turn("Hello, custom sink.")
    # Let the TTS streaming task finish before the loop exits.
    await asyncio.sleep(0.5)


asyncio.run(main())
```

#### Browser playback (JS reference client)

The framework ships a vanilla JavaScript / Web Audio reference client at
`converse_framework/js/tts-audio-player.js` that turns the framework's
`tts.audio` events into sound without bundling a build step. It builds
`AudioBuffer`s directly from PCM s16le bytes (avoiding
`decodeAudioData` on tiny chunks) and coalesces consecutive events
within a short window before scheduling, which is the same fix that
resolved Pocket TTS choppiness in the reference harness.

```html
<script src="converse_framework/js/tts-audio-player.js"></script>
<script>
  const player = new TtsAudioPlayer({ coalesceMs: 80 });
  ws.addEventListener('message', (ev) => {
    const event = JSON.parse(ev.data);
    if (event.type === 'tts.audio') player.onEvent(event);
  });
  // when the conversation ends:
  player.close();
</script>
```

The reference client handles the most common case (mono / stereo PCM
s16le with explicit sample rate, channels, and `final` flag) and
ignores anything that is not `pcm_s16le` with a console warning. Drop
the file into your static assets directory; no npm / bundler required.

#### Browser microphone capture (JS reference client)

The framework ships a vanilla JavaScript microphone capture class at
`converse_framework/js/mic-frame-sender.js`. It uses `getUserMedia` and
an `AudioWorkletNode` (with inline blob-URL processor, falling back to
`ScriptProcessorNode`) to deliver 16-bit PCM s16le frames at a
configurable interval:

```html
<script src="converse_framework/js/mic-frame-sender.js"></script>
<script>
  const ws = new WebSocket("ws://localhost:8000/ws");
  const mic = new MicFrameSender({
    webSocket: ws,
    sampleRate: 16000,
    channels: 1,
    frameMs: 30,
    onLevel: (db) => console.log("mic level", db.toFixed(1)),
  });
  mic.start(); // begins capture after user gesture
</script>
```

A composed client at `converse_framework/js/browser-voice-client.js`
combines `MicFrameSender`, `TtsAudioPlayer`, and an optional
`SpeakerEchoGuard` (see `converse_framework/js/speaker-echo-guard.js`)
into a single class with automatic WebSocket event dispatch.

Mobile microphone access requires additional HTTPS / tunnel setup
(see next section).

#### Mobile Browser Microphone Testing

Browser microphone capture (via `getUserMedia`) requires a **secure
context** — HTTPS, `localhost`, or `127.0.0.1`.  This is not a
framework limitation; it is a browser security requirement.

**Local desktop development** — `localhost` is always considered
secure.  A plain `ws://localhost:8000/ws` works with no extra setup.

**Same-LAN testing (desktop)** — also works, because
`ws://<lan-ip>/ws` is accepted by desktop browsers for
`WebSocket.send()` (it is the `getUserMedia` call that checks the page
context, not the WebSocket itself).  Serve the HTML page itself via
HTTPS to keep mobile browsers happy (see below).

**Mobile device on same LAN** — a plain `http://<lan-ip>` page will
be rejected by mobile browsers when calling `getUserMedia`.  You need
either a tunnel that provides HTTPS or a local trusted certificate.

---

**Option 1 — Cloudflare Tunnel (recommended for testing)**

1. Install `cloudflared` (`winget install cloudflare.cloudflared` on
   Windows, `brew install cloudflare/cloudflare/cloudflared` on macOS,
   or download from the Cloudflare Zero Trust dashboard).
2. Start your server on port 8000:
   ```bash
   uvicorn converse_framework.examples.websocket_voice_chat:create_app --factory
   ```
3. Run the tunnel:
   ```bash
   cloudflared tunnel --url http://localhost:8000
   ```
4. Cloudflare prints a public `https://<random>.trycloudflare.com` URL.
5. Open that URL on your mobile device.  Change the WebSocket URL in
   your client to `wss://<random>.trycloudflare.com/ws`.

---

**Option 2 — ngrok**

1. Install ngrok from https://ngrok.com/download.
2. Start your server on port 8000.
3. Tunnel:
   ```bash
   ngrok http 8000
   ```
4. Use the generated `https://<random>.ngrok-free.app` URL.
5. WebSocket URL: `wss://<random>.ngrok-free.app/ws`.

---

**Option 3 — Local trusted certificate (advanced)**

Use `mkcert` to create a trusted CA-signed cert for your LAN IP::

```bash
# Install mkcert once
brew install mkcert  # macOS
winget install mkcert  # Windows (or scoop install mkcert)
mkcert -install

# Create a cert for your LAN IP, e.g. 192.168.1.42
mkcert 192.168.1.42 localhost 127.0.0.1

# Run uvicorn with the generated key/cert files
uvicorn converse_framework.examples.websocket_voice_chat:create_app --factory \
    --ssl-keyfile ./192.168.1.42-key.pem \
    --ssl-certfile ./192.168.1.42.pem
```

The page and WebSocket are now served over `https://192.168.1.42:8000`
and `wss://192.168.1.42:8000/ws` respectively.  The `mkcert` root CA
must be installed on the mobile device (see `mkcert` docs for Android
/iOS instructions).

---

**Summary of WebSocket URL forms**

| Scenario | Page URL | WebSocket URL |
|---|---|---|
| Desktop localhost | `http://localhost:8000` | `ws://localhost:8000/ws` |
| Desktop same LAN | `http://<lan-ip>:8000` | `ws://<lan-ip>:8000/ws` |
| Mobile via tunnel | `https://<tunnel>/` | `wss://<tunnel>/ws` |
| Mobile via local cert | `https://<lan-ip>:8000` | `wss://<lan-ip>:8000/ws` |
#### Wrap an external CLI as a provider

When the engine you want to use is only available as a CLI binary
(`whisper-cli`, `whisper.cpp/main`, the Vosk CLI, …), the framework's
`converse_framework.examples.subprocess_provider` shows the pattern.
The class shells out to a configured binary, writes a WAV header
followed by the caller's PCM s16le body to the subprocess's stdin,
and yields the subprocess's stdout as a single final transcript
event.

```python
from converse_framework.examples.subprocess_provider import (
    SubprocessASRProvider,
)

provider = SubprocessASRProvider({
    "binary": "whisper-cli",
    "model": "ggml-small.en.bin",
    "command_template": ["-m", "{model}", "-f", "-"],
    "timeout_s": 120,
})
# Then plug it into a ProviderBundle:
from converse_framework.registry import build_provider_bundle
bundle = build_provider_bundle(
    {
        "vad": {"provider": "mock"},
        "asr": {"provider": "subprocess"},   # see note below
        "llm": {"provider": "mock"},
        "tts": {"provider": "mock"},
    },
)
```

`SubprocessASRProvider` is shipped as a recipe (not a registered
provider) because it is generic: copy the class, point it at your
binary of choice, and register it with `register_provider("asr",
"my-name", "my.module:MySubprocessProvider")`. The example also
ships a fake-echo script (`--use-fake-echo`) that lets the driver
run end-to-end in CI without installing any real ASR.

#### Pocket TTS voice listing and configuration

Pocket TTS supports listing available voices and changing voice or
other options at runtime via :meth:`TTSProvider.configure` (introduced
in protocol v0.2).  All variants return a :class:`ProviderConfigResult`
with ``changed`` and ``requires_reload`` flags.

List voices without importing the heavy ONNX backend:

```python
from converse_framework.providers.pocket_tts import PocketTTSProvider

provider = PocketTTSProvider({"voice": "azelma"})
voices = provider.list_voices()
for v in voices:
    print(f"{v.id}: {v.name} ({v.gender}, {v.language})")
    # e.g. "azelma: Azelma (Female, en)"
```

Change voice (clears only the voice cache, preserves the loaded model):

```python
result = provider.configure(voice="anna")
print(result.changed, result.requires_reload)
# True, False — model stays, voice state reloaded
```

Change quantization or temperature (clears both model and voice,
requiring a full reload on next synthesis):

```python
result = provider.configure(quantize=True)
print(result.requires_reload)
# True — both _model and _voice_state cleared
```

Change ``max_tokens`` or ``coalesce_ms`` without unloading:

```python
result = provider.configure(max_tokens=250, coalesce_ms=120)
print(result.requires_reload)
# False — values stored, no cache invalidated
```

``ProviderBundle.replace()`` and ``pipeline.update_providers()``
(see the Runtime Provider Updates section) work with any TTS
provider including Pocket TTS.

#### CUDA DLL helper (Windows)

On Windows, NVIDIA wheel packages like ``nvidia-cublas-cu12`` install
DLLs under ``site-packages/nvidia/<package>/bin/``, but C extension
libraries such as CTranslate2 may not search those directories
automatically.  The framework ships a CUDA DLL discovery helper at
``converse_framework/cuda_utils.py`` that finds them and adds them to
the DLL search path.

```python
from converse_framework.cuda_utils import (
    add_nvidia_dll_directories,
    discover_nvidia_dll_dirs,
    format_nvidia_dll_diagnostic,
)

# Add all discovered NVIDIA DLL directories to the search path.
# Keep the handles alive for the lifetime of the process.
dll_handles = add_nvidia_dll_directories()

# Print a diagnostic string for debugging:
print(format_nvidia_dll_diagnostic())
```

The helper searches ``nvidia/cublas/bin``, ``nvidia/cudnn/bin``,
``nvidia/cusparse/bin``, ``nvidia/cusolver/bin``, and
``nvidia/curand/bin`` inside site-packages.  It is Windows-only
(no-op on other platforms) and best-effort — failures are logged,
not raised.

``FasterWhisperASRProvider`` calls ``add_nvidia_dll_directories()``
automatically inside ``_ensure_model()`` when the config option
``auto_cuda_dll_dirs`` is ``True`` (the default).  Disable with:

```python
provider = FasterWhisperASRProvider({
    "model": "large-v3-turbo",
    "device": "cuda",
    "auto_cuda_dll_dirs": False,  # disable auto-discovery
})
```

## Runtime Provider Updates

The framework supports swapping providers at runtime without
recreating the pipeline or collector. This is useful for settings
UIs that let users change TTS voice, VAD model, or ASR backend
without restarting the conversation.

### ProviderBundle.replace()

:meth:`ProviderBundle.replace` creates a new bundle with specific
providers swapped out by keyword argument, inheriting the rest
from the original bundle. It is a no-side-effect, no-copy operation
— the caller owns the lifecycle of the old providers.

```python
from converse_framework import build_provider_bundle, build_provider

bundle = build_provider_bundle({
    "vad": {"provider": "mock"},
    "asr": {"provider": "mock"},
    "llm": {"provider": "mock"},
    "tts": {"provider": "mock"},
})

new_tts = build_provider("tts", "mock", {"first_chunk_delay_ms": 500})
new_bundle = bundle.replace(tts=new_tts)
# new_bundle.tts is the new provider; vad/asr/llm are unchanged.
# bundle is unaffected.
```

Multiple providers can be replaced at once:

```python
replaced = bundle.replace(vad=new_vad, tts=new_tts)
```

### ProviderBundle.unload_replaced()

:meth:`ProviderBundle.unload_replaced` compares two bundles by
identity and calls ``unload()`` on every provider that differs.
Providers with the same identity reference are left untouched.

```python
old_bundle = build_provider_bundle(config)
new_bundle = old_bundle.replace(tts=new_tts)
await ProviderBundle.unload_replaced(old_bundle, new_bundle)
```

### SpeechPipeline.update_providers()

:meth:`SpeechPipeline.update_providers` is the safe way to swap
providers on an active pipeline. It cancels in-flight TTS
synthesis by default (so the next turn picks up the new
provider), swaps the bundle, and emits a ``providers.updated``
event with the serialized statuses of the new bundle.
Conversation history is **not** cleared.

```python
from converse_framework import (
    PipelineConfig, QueueEventSink, SpeechPipeline,
    build_provider_bundle,
)

queue = asyncio.Queue()
pipeline = SpeechPipeline(
    providers=build_provider_bundle(initial_config),
    sink=QueueEventSink(queue),
    config=PipelineConfig(),
)

new_bundle = build_provider_bundle(updated_config)
await pipeline.update_providers(new_bundle, reason="settings_change")
# pipeline.providers is now new_bundle
# TTS was cancelled if it was playing
# providers.updated event was emitted
```

### AudioUtteranceCollector.update_vad_provider()

:meth:`AudioUtteranceCollector.update_vad_provider` swaps the VAD
provider that drives utterance boundary detection. It raises
:class:`RuntimeError` if the collector is currently recording an
utterance to avoid corrupting in-flight VAD state. The
pre-speech buffer is cleared on swap so stale audio from the old
VAD is not passed to the new one.

```python
new_vad = SileroVADProvider({"speech_threshold": 0.6})
collector.update_vad_provider(new_vad)
```

### End-to-end pattern

A typical settings-update flow combines all the pieces:

```python
# 1. Build the new bundle
new_bundle = bundle.replace(tts=new_tts)

# 2. Probe without loading models
probe_results = await new_bundle.probe_statuses()

# 3. On user confirmation, swap in the pipeline
await pipeline.update_providers(new_bundle)

# 4. Swap the VAD in the collector (separate because the
#    collector and pipeline are independent components)
if "vad" in updated:
    collector.update_vad_provider(new_bundle.vad)

# 5. Old providers are unloaded in the background by
#    pipeline.update_providers().
```

## WebSocket Session Helper

The framework provides a reusable :class:`WebSocketSession` that
handles the common message-dispatch loop for browser-based voice apps.
It owns the transport, sink, provider bundle, pipeline, collector, and
frame stats, and routes seven built-in message types without requiring
the application to copy the recipe state machine.

Built-in message types:

* ``audio.frame`` — validated PCM frame forwarded to the utterance
  collector.
* ``text.turn`` — text conversation turn.
* ``conversation.clear`` — clears per-mode conversation history.
* ``tts.cancel`` — cancels in-flight TTS synthesis.
* ``status.request`` — emits probe/check/load status (kind selected
  by the ``probe`` / ``check`` / ``load`` flag in the payload).
* ``settings.update`` — delegated to an optional
  :class:`WebSocketSessionHooks` callback.
* ``providers.reload`` — swaps the provider bundle and optionally
  reloads the VAD provider, with ``before`` / ``after`` hooks.

Unknown message types fall through to the optional
``on_unknown_message`` hook or emit a ``turn.error`` event.

Configuration and hooks are supplied via:

* :class:`WebSocketSessionConfig` — provider config, collector config,
  pipeline config, default mode, auto-probe on reload.
* :class:`WebSocketSessionHooks` — optional async callbacks for
  unknown messages, settings updates, status requests, provider reload
  lifecycle, and event monitoring.

The session class lives at ``converse_framework.session`` and is **not**
imported from the top-level ``__init__.py`` to keep lightweight imports
for apps that do not use it.

Usage sketch:

```python
from converse_framework.session import (
    WebSocketSession,
    WebSocketSessionConfig,
    WebSocketSessionHooks,
)

hooks = WebSocketSessionHooks(
    on_settings_update=lambda cfg: print("settings updated", cfg),
    on_event=lambda ev: print("event", ev.type),
)
session = WebSocketSession(
    transport=your_transport,
    config=WebSocketSessionConfig(
        provider_config={"vad": {"provider": "mock"}, ...},
    ),
    hooks=hooks,
)

async for message in your_websocket:
    await session.handle_message(message)
```

## Examples

### Text chat (automated-test covered)

Run a real text conversation against `SpeechPipeline` using only the
framework's public API. No FastAPI, no WebSocket, no profile files.

```bash
python -m converse_framework.examples.text_chat
```

Try a real provider by passing overrides (the matching extra must be
installed):

```bash
python -m converse_framework.examples.text_chat \
    --provider asr=faster-whisper \
    --provider llm=llamacpp \
    --provider tts=kokoro
```

The driver behind the CLI is `converse_framework.examples.text_chat.run_text_chat`,
which is what the test suite exercises.

### Voice chat (manual)

The voice example wires an `AudioUtteranceCollector` to the pipeline
and feeds it PCM frames. It is a **manual** example — you supply a
WAV file (or replace the source with a microphone capture) and the
script drives the conversation. It is intentionally not covered by
the automated tests because it depends on platform audio I/O.

```bash
# With real providers installed
python -m converse_framework.examples.voice_chat --input path/to/16k_mono.wav

# Or run the same flow with mock providers to validate the path
python -m converse_framework.examples.voice_chat --mock --input path/to/16k_mono.wav
```

## Framework / App Boundary

The framework owns the **provider-agnostic speech stack**:

* Provider protocols (`VADProvider`, `ASRProvider`, `LLMProvider`, `TTSProvider`).
* Audio frame parsing, PCM conversion, metering, and silence trimming.
* Event sink API and the wire shape used by the browser UI.
* `SpeechPipeline` turn orchestration (ASR → LLM → TTS, streaming
  chunks, cancellation, barge-in).
* `AudioUtteranceCollector` (VAD-driven utterance collection).
* A lazy provider registry and the optional concrete providers
  behind extras.
* `WebSocketSession` (optional reusable message-dispatch loop).
* Browser JS helpers (`mic-frame-sender.js`, `speaker-echo-guard.js`,
  `browser-voice-client.js`, `tts-audio-player.js`).
* CUDA DLL discovery helper (`cuda_utils`).

As of v0.2 the framework also provides safe provider-swap mechanics
(``ProviderBundle.replace()``, ``pipeline.update_providers()``,
``collector.update_vad_provider()``), first-class provider
configuration (``configure()``, ``list_voices()``), and lifecycle
events (``provider.loading``, ``provider.loaded``, ``provider.error``).

The framework does **not** own the application. The following stay in
the consumer app (e.g. the reference harness):

* FastAPI app, REST endpoints, WebSocket handler.
* Profile files and runtime settings persistence.
* Character card parsing and first-message seeding.
* Companion mode policy and memory store.
* TTS preset manager and provider settings UX.
* The WebSocket transport itself.

### Transport boundary

The framework defines a generic `Transport` protocol and ships a
`QueueTransport` for tests. The consumer app owns the real
WebSocket transport — `WebSocketTransport` (or equivalent) lives in
the app, not in the framework, so the framework never takes a hard
dependency on FastAPI. The reference harness exposes
`conversational_harness.transport.WebSocketTransport` for that
purpose.

## Status

The package is in v0.1 pre-release. The test matrix below is the
current contract:

| Surface | Tests |
|---|---:|
| `converse_framework` (base) | 126 |
| Reference harness (`Reference-Repository-Conversational-AI-Harness`) | 91 passed, 1 skipped |

Run them locally:

```bash
# Framework (run from the package root)
python -m pytest

# Harness (run from inside the harness directory)
python -m pytest
```
