Metadata-Version: 2.4
Name: openspeechapi
Version: 0.2.9
Summary: Unified speech interface for STT/TTS providers
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27
Requires-Dist: loguru>=0.7
Requires-Dist: msgpack>=1.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: alibaba
Provides-Extra: alibaba-stt
Provides-Extra: alibaba-tts
Provides-Extra: all
Requires-Dist: elevenlabs; extra == 'all'
Requires-Dist: faster-whisper; extra == 'all'
Requires-Dist: openai; extra == 'all'
Requires-Dist: openai-whisper; extra == 'all'
Requires-Dist: piper-tts; extra == 'all'
Requires-Dist: pyttsx3; (sys_platform == 'win32') and extra == 'all'
Requires-Dist: torchaudio; extra == 'all'
Requires-Dist: tts; extra == 'all'
Requires-Dist: websockets; extra == 'all'
Provides-Extra: assemblyai-stt
Provides-Extra: audio
Requires-Dist: numpy; extra == 'audio'
Requires-Dist: sounddevice; extra == 'audio'
Provides-Extra: azure
Provides-Extra: azure-stt
Provides-Extra: azure-tts
Provides-Extra: baidu
Provides-Extra: baidu-stt
Provides-Extra: baidu-tts
Provides-Extra: cloud
Requires-Dist: websockets; extra == 'cloud'
Provides-Extra: coqui-tts
Requires-Dist: tts; extra == 'coqui-tts'
Provides-Extra: cosyvoice-tts
Requires-Dist: torchaudio; extra == 'cosyvoice-tts'
Provides-Extra: deepgram
Requires-Dist: websockets; extra == 'deepgram'
Provides-Extra: deepgram-stt
Requires-Dist: websockets; extra == 'deepgram-stt'
Provides-Extra: deepgram-tts
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest-dotenv; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff==0.15.*; extra == 'dev'
Provides-Extra: elevenlabs
Requires-Dist: elevenlabs; extra == 'elevenlabs'
Requires-Dist: websockets; extra == 'elevenlabs'
Provides-Extra: elevenlabs-stt
Requires-Dist: websockets; extra == 'elevenlabs-stt'
Provides-Extra: elevenlabs-tts
Requires-Dist: elevenlabs; extra == 'elevenlabs-tts'
Provides-Extra: faster-whisper-stt
Requires-Dist: faster-whisper; extra == 'faster-whisper-stt'
Provides-Extra: fish-speech-tts
Provides-Extra: gemma4-stt
Requires-Dist: mlx-vlm; (sys_platform == 'darwin') and extra == 'gemma4-stt'
Provides-Extra: google
Provides-Extra: google-stt
Provides-Extra: google-tts
Provides-Extra: iflytek
Requires-Dist: websockets; extra == 'iflytek'
Provides-Extra: iflytek-stt
Requires-Dist: websockets; extra == 'iflytek-stt'
Provides-Extra: iflytek-tts
Requires-Dist: websockets; extra == 'iflytek-tts'
Provides-Extra: macos-native
Provides-Extra: minimax-tts
Provides-Extra: openai
Requires-Dist: openai; extra == 'openai'
Provides-Extra: openai-stt
Requires-Dist: openai; extra == 'openai-stt'
Provides-Extra: openai-tts
Requires-Dist: openai; extra == 'openai-tts'
Provides-Extra: piper-tts
Requires-Dist: piper-tts; extra == 'piper-tts'
Provides-Extra: server
Requires-Dist: fastapi; extra == 'server'
Requires-Dist: python-multipart; extra == 'server'
Requires-Dist: uvicorn; extra == 'server'
Requires-Dist: websockets; extra == 'server'
Provides-Extra: sherpa-onnx-stt
Requires-Dist: websockets; extra == 'sherpa-onnx-stt'
Provides-Extra: tencent
Provides-Extra: tencent-stt
Provides-Extra: tencent-tts
Provides-Extra: tracing
Requires-Dist: opentelemetry-api; extra == 'tracing'
Requires-Dist: opentelemetry-sdk; extra == 'tracing'
Provides-Extra: volcengine
Provides-Extra: volcengine-stt
Provides-Extra: volcengine-tts
Provides-Extra: whisper-stt
Requires-Dist: openai-whisper; extra == 'whisper-stt'
Provides-Extra: whisperlivekit-stt
Requires-Dist: websockets; extra == 'whisperlivekit-stt'
Provides-Extra: windows-native
Requires-Dist: pyttsx3; (sys_platform == 'win32') and extra == 'windows-native'
Description-Content-Type: text/markdown

# OpenSpeechAPI

> Unified speech interface for STT/TTS providers — one API, multiple backends.

OpenSpeechAPI 提供统一的语音接口，通过字符串指定 provider 即可切换不同的 STT/TTS 后端（云端 API、本地模型），无需关心底层实现。

## Quick Start

### 安装

```bash
# 安装全部 provider
pip install -e ".[all]"

# 或按需安装
pip install -e ".[openai]"           # OpenAI Whisper STT + TTS
pip install -e ".[faster-whisper]"   # 本地 faster-whisper STT
pip install -e ".[openai,faster-whisper]"  # 指定多个

# 仅核心包（不含任何 provider）
pip install -e .

# 开发环境
pip install -e ".[dev]"
```

### 30 秒上手 — TTS

```python
import asyncio
from openspeechapi import create_provider

async def main():
    tts = create_provider("openai-tts", api_key="sk-...")
    await tts.start()

    audio = await tts.synthesize("Hello, OpenSpeechAPI!")

    import wave
    with wave.open("output.wav", "wb") as wf:
        wf.setnchannels(audio.channels)
        wf.setsampwidth(2)
        wf.setframerate(audio.sample_rate)
        wf.writeframes(audio.data)

    await tts.stop()

asyncio.run(main())
```

### 30 秒上手 — STT

```python
import asyncio
from openspeechapi import create_provider, AudioData, AudioFormat
from pathlib import Path

async def main():
    stt = create_provider("faster-whisper", model_size="tiny")
    await stt.start()

    audio = AudioData(
        data=Path("output.wav").read_bytes(),
        sample_rate=16000, channels=1, format=AudioFormat.WAV,
    )
    result = await stt.transcribe(audio)
    print(result.text)        # "Hello, OpenSpeechAPI!"
    print(result.language)    # "en"
    print(result.confidence)  # 0.98

    await stt.stop()

asyncio.run(main())
```

### macOS 零依赖快速上手

在 macOS 上无需任何 API Key 或模型下载，开箱即用：

```bash
# 1. 克隆项目
git clone https://github.com/wingsfly/OpenSpeechAPI.git
cd OpenSpeechAPI

# 2. 安装（仅核心包 + 服务依赖）
pip install -e ".[server]"

# 3. 启动服务和 WebUI
python -m openspeechapi.cli --config providers.yaml serve

# 4. 浏览器打开 http://127.0.0.1:8600/ui/
#    - TTS：选择 macos_tts → 选择发音人（如 Tingting）→ 输入文本 → Run TTS
#    - STT：前往 Engine Catalog → macOS STT → Install（自动下载预编译包）
```

#### macOS STT 安装（通过 WebUI）

`macos-stt` 默认未写入配置（避免"假可用"），需通过 Engine Catalog 一键安装：

1. WebUI → **Engine Catalog** → macOS STT → **Install**
   - 优先下载 CI 预构建的 universal `.app`（无需 Xcode，通过 `gh` 自动完成）
   - 若 `gh` 不可用或资产下载失败，自动回退到本地编译（需 Xcode Command Line Tools）
2. 安装完成后，**手动授权语音识别**（每台机器一次）：
   ```bash
   open scripts/engines/macos-stt/MacOSSTTHelper.app
   # 弹出对话框后点击"允许"
   ```
3. **手动下载听写语言模型**（每台机器一次）：
   系统设置 > 键盘 > 听写 > 下载所需语言模型（中文/英文等）

安装完成后 `macos_stt` 自动写入配置并热重载，Dashboard 显示 healthy。

> 授权和听写模型下载是 macOS TCC 系统限制，无法自动化，必须每台机器手动执行一次。
> 详细机制见 [docs/architecture/native-engine-install.md](docs/architecture/native-engine-install.md)。

## CLI Demo

无需写代码，直接在命令行体验：

```bash
# TTS：文本 → 语音
python -m openspeechapi.demo tts -t "Hello world" -o output.wav

# STT：语音 → 文本
python -m openspeechapi.demo stt -i output.wav -p faster-whisper

# Roundtrip：文本 → TTS → STT → 文本
python -m openspeechapi.demo roundtrip -t "Hello world"

# Compare：多引擎对比
python -m openspeechapi.demo compare -i output.wav -p openai,faster-whisper

# REPL：交互模式
python -m openspeechapi.demo repl

# WebUI（Phase A）
python -m openspeechapi.cli serve --host 0.0.0.0 --port 8600
# 浏览器打开 http://127.0.0.1:8600/ui

# 实时 STT：优先使用 WebSocket PCM 流式（/v1/stt/stream），
# 若浏览器或链路异常会自动回退到分片 HTTP 转写模式。
```

### 本地引擎管理（实验特性）

```bash
# 1) 下载/更新运行镜像
python -m openspeechapi.cli engine install --name fish-speech --runtime docker --follow

# 2) 启动本地引擎（含健康检查）
python -m openspeechapi.cli engine start --name fish-speech --runtime docker --follow

# 3) 查看运行状态/日志
python -m openspeechapi.cli engine status --name fish-speech --runtime docker
python -m openspeechapi.cli engine logs --name fish-speech --runtime docker --lines 200

# 4) 停止
python -m openspeechapi.cli engine stop --name fish-speech --runtime docker --follow

# 5) 跨进程查询任务
python -m openspeechapi.cli engine task list --name fish-speech --limit 20
python -m openspeechapi.cli engine task status --task-id <TASK_ID>
python -m openspeechapi.cli engine task follow --task-id <TASK_ID>
python -m openspeechapi.cli engine task cancel --task-id <TASK_ID>
```

进度反馈会显示 task id、阶段、百分比和当前消息，便于追踪长耗时任务。

#### STT 本地模型引擎（复用已有模型路径）

```bash
# faster-whisper 模型资产（native，无常驻服务）
python -m openspeechapi.cli engine install --name faster-whisper --runtime native --follow
python -m openspeechapi.cli engine start   --name faster-whisper --runtime native --follow
python -m openspeechapi.cli engine status  --name faster-whisper --runtime native

# whisper 模型资产（native，无常驻服务）
python -m openspeechapi.cli engine install --name whisper --runtime native --follow
python -m openspeechapi.cli engine start   --name whisper --runtime native --follow
python -m openspeechapi.cli engine status  --name whisper --runtime native
```

说明：安装会优先读取 `~/.aim/config.json + ~/.aim/registry.json` 的 provision 信息来定位模型；若 AIM 未命中，再回退默认本地路径候选。若仍未找到，可按配置走“模拟下载”流程以验证安装进度。

### Demo 音频播放

```bash
# 合成后直接播放
python -m openspeechapi.demo tts -t "Hello world" --play

# 指定播放参数
python -m openspeechapi.demo tts -t "Hello world" --play \
  --play-backend sounddevice --play-device 2 --play-volume 0.8
```

## Providers

### 已实现

| Provider | 类型 | 说明 | 执行模式 | 安装 |
|----------|------|------|----------|------|
| `openai-stt` | STT | OpenAI Whisper API（云端） | remote | `pip install -e ".[openai]"` |
| `faster-whisper` | STT | 本地 Whisper 推理（GPU/CPU） | subprocess | `pip install -e ".[faster-whisper]"` |
| `whisper` | STT | OpenAI Whisper 本地推理（CPU/GPU） | subprocess | `pip install -e ".[whisper]"` |
| `whisperlivekit-stt` | STT | WhisperLiveKit 本地服务（Deepgram 兼容 WS，支持 MLX 后端） | local | `pip install -e ".[whisperlivekit]"` |
| `elevenlabs-stt` | STT | ElevenLabs Scribe API（云端，支持实时流式 WS + 批量） | remote | `pip install -e ".[elevenlabs-stt]"` |
| `deepgram` | STT | Deepgram API（云端，支持实时流式） | remote | `pip install -e ".[deepgram]"` |
| `gemma4` | STT | Google Gemma 4 多模态 ASR（macOS/MLX 本地，E4B 默认/12B 可选，>30s 自动分段，支持转写/翻译/理解） | subprocess | `pip install -e ".[gemma4-stt]"` |
| `openai-tts` | TTS | OpenAI Speech API（云端，支持流式） | remote | `pip install -e ".[openai]"` |
| `elevenlabs` | TTS | ElevenLabs 高质量语音（云端，支持 HTTP/WS 流式） | remote | `pip install -e ".[elevenlabs-tts]"` |
| `minimax` | TTS | Minimax 语音合成（云端） | remote | `pip install -e ".[minimax]"` |
| `cosyvoice` | TTS | CosyVoice 本地中文语音合成（GPU） | subprocess | 需手动安装 CosyVoice |
| `fish-speech` | TTS | Fish-Speech 本地多语 TTS + voice clone | local | `pip install -e ".[fish-speech]"` |
| `piper` | TTS | Piper 轻量级本地 TTS（CPU 即可） | in_process | `pip install -e ".[piper]"` |
| `macos-say` | TTS | macOS 内置语音合成（`say` 命令，零依赖） | in_process | 无需安装，macOS 自带 |
| `macos-stt` | STT | macOS 内置语音识别（SFSpeechRecognizer） | in_process | WebUI Engine Catalog → Install（预编译优先，编译兜底） |

### Stub（待实现）

`coqui`

### 查看所有 provider

```python
from openspeechapi import list_providers
print(list_providers())
# ['coqui', 'cosyvoice', 'deepgram', 'elevenlabs', 'faster-whisper',
#  'fish-speech', 'minimax', 'openai-stt', 'openai-tts', 'piper', 'whisper',
#  'whisperlivekit-stt']
```

## Provider 参数

### `openai-stt`

```python
create_provider("openai-stt",
    api_key="sk-...",         # 必填，OpenAI API Key
    model="whisper-1",        # 模型名称
)
```

转录选项通过 `STTOptions` 传入：

```python
from openspeechapi import STTOptions
result = await stt.transcribe(audio, STTOptions(
    language="zh",            # 语言提示
    prompt="技术会议记录",     # 上下文提示
    temperature=0.0,          # 0.0-1.0
))
```

### `faster-whisper`

```python
create_provider("faster-whisper",
    model_size="base",        # tiny / base / small / medium / large-v3
    device="auto",            # auto / cuda / cpu
    compute_type="default",   # default / int8 / float16
    beam_size=5,              # beam search 宽度
    download_root=None,       # 模型缓存目录
)
```

### `openai-tts`

```python
create_provider("openai-tts",
    api_key="sk-...",         # 必填，OpenAI API Key
    model="tts-1",            # tts-1 / tts-1-hd
    voice="alloy",            # alloy / echo / fable / onyx / nova / shimmer
    response_format="pcm",    # 输出格式
)
```

合成选项通过 `TTSOptions` 传入：

```python
from openspeechapi import TTSOptions
audio = await tts.synthesize("Hello", TTSOptions(
    voice="nova",             # 覆盖默认声音
    speed=1.2,                # 语速倍率
))
```

### `deepgram`

```python
create_provider("deepgram",
    api_key="...",            # 必填，Deepgram API Key
    model="nova-2",           # 模型名称
    language="en",            # 默认语言
    punctuate=True,           # 自动标点
    smart_format=True,        # 智能格式化
)
```

支持实时流式转录（`transcribe_stream`），详见[流式 STT](#流式-stt) 章节。

### `elevenlabs`

```python
create_provider("elevenlabs",
    api_key="...",            # 必填，ElevenLabs API Key
    voice_id="21m00Tcm4TlvDq8ikWAM",  # 声音 ID
    model_id="eleven_monolingual_v1",  # 模型
    stability=0.5,            # 声音稳定性
    similarity_boost=0.75,    # 相似度增强
)
```

### `minimax`

```python
create_provider("minimax",
    api_key="...",            # 必填，Minimax API Key
    group_id="...",           # 必填，Minimax Group ID
    model="speech-01-turbo",  # 模型
    voice_id="male-qn-qingse", # 声音 ID
    speed=1.0,                # 语速
)
```

### `cosyvoice`

```python
create_provider("cosyvoice",
    model_dir="/path/to/model",  # 必填，本地模型目录
    device="auto",               # auto / cuda / cpu
    spk_id="中文女",              # 说话人 ID
)
```

### `fish-speech`

```python
create_provider("fish-speech",
    api_url="http://localhost:8080",  # Fish-Speech 本地服务地址
    reference_audio=None,            # 参考音频路径（voice clone）
)
```

### `piper`

```python
create_provider("piper",
    model_path="/path/to/model.onnx",  # 必填，模型文件路径
    config_path="/path/to/config.json", # 必填，配置文件路径
    use_cuda=False,           # 是否使用 GPU
    length_scale=1.0,         # 语速（越大越慢）
    noise_scale=0.667,        # 噪声比例
)
```

### `macos-say`（macOS 原生 TTS）

零额外依赖，使用 macOS 内置 `say` 命令，支持系统所有发音人。

```python
create_provider("macos-say",
    default_voice="Tingting",  # 默认发音人（say -v '?' 查看全部）
    default_rate=200,          # 默认语速（words per minute）
)
```

支持通过 `list_voices()` 获取所有可用发音人（按语言分组）。合成时通过 `TTSOptions(voice="Samantha", speed=1.5)` 指定发音人和语速。

### `macos-stt`（macOS 原生 STT）

使用 macOS 内置 SFSpeechRecognizer，通过 Swift CLI 助手（`.app` bundle）实现。
**推荐通过 WebUI Engine Catalog 一键安装**（自动下载预编译 universal 包，无需 Xcode）。

```python
create_provider("macos-stt",
    language="zh-CN",          # 默认识别语言
    binary_path="",            # Swift 工具路径，空则自动检测
)
```

**安装方式（推荐）：** WebUI → Engine Catalog → macOS STT → **Install**

安装流程：B 预编译优先（`gh release download`），B 不可用时自动 C 兜底（`bash install.sh`，需 Xcode CLT）。
详见 [docs/architecture/native-engine-install.md](docs/architecture/native-engine-install.md)。

**每台机器必须手动完成一次（无法自动化）：**

```bash
# 1. 授权语音识别权限（安装后运行，弹出对话框后点击"允许"）
open scripts/engines/macos-stt/MacOSSTTHelper.app

# 2. 验证授权状态
scripts/engines/macos-stt/MacOSSTTHelper.app/Contents/MacOS/macos-stt-helper --check --language en-US
```

- **系统设置 > 键盘 > 听写** → 下载对应语言的离线听写模型（中文、英文等）
- macOS 13+ 支持完全离线识别，旧版本需联网

**高级 / 离线手动安装（不依赖 WebUI 或 `gh`）：**

```bash
# 需要 Xcode Command Line Tools（xcode-select --install）
bash scripts/engines/macos-stt/install.sh
```

## HTTP 服务 + Client 模式

### 启动服务

```bash
openspeechapi serve --config providers.yaml --port 8600
```

### Python Client（与 Library 模式接口一致）

```python
from openspeechapi import Client

async with Client("http://localhost:8600") as c:
    # STT
    result = await c.stt.transcribe("faster-whisper", audio)

    # TTS
    audio = await c.tts.synthesize("openai-tts", "Hello world")

    # FanOut
    result = await c.stt.fanout(["openai", "faster-whisper"], audio, strategy="collect_all")

    # 管理
    providers = await c.list_providers()
    health = await c.health()
```

### REST API

```bash
# STT
curl -X POST http://localhost:8600/v1/stt/transcribe \
  -F audio=@audio.wav -F provider=faster-whisper

# TTS
curl -X POST http://localhost:8600/v1/tts/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello", "provider": "openai-tts"}' --output out.wav

# 管理
curl http://localhost:8600/v1/providers
curl http://localhost:8600/v1/health
curl http://localhost:8600/v1/metrics
```

## 高级用法

### Config-Driven（YAML 配置）

```yaml
# providers.yaml
providers:
  cloud-stt:
    provider: openai-stt
    exec_mode: remote
    settings:
      api_key: ${OPENAI_API_KEY}

  local-stt:
    provider: faster-whisper
    exec_mode: subprocess      # 独立进程，隔离 GPU 内存
    settings:
      model_size: large-v3
      device: cuda
```

`exec_mode` 约定：
- `subprocess`：子进程模型推理（IPC）
- `local`：本地服务引擎（HTTP/HTTPS）
- `remote`：云端服务 API
- `in_process`：预留给真正进程内推理（兼容旧配置，建议迁移）

```python
from openspeechapi import ServiceDispatcher, ProviderRegistry
from openspeechapi.providers.stt.openai import OpenAISTT
from openspeechapi.providers.stt.faster_whisper import FasterWhisperSTT

registry = ProviderRegistry()
registry.register("openai-stt", OpenAISTT)
registry.register("faster-whisper", FasterWhisperSTT)

dispatcher = ServiceDispatcher.from_config("providers.yaml", registry)
await dispatcher.start()

result = await dispatcher.stt.transcribe("cloud-stt", audio)
await dispatcher.stop()
```

### FanOut — 多引擎并发

```python
from openspeechapi.dispatch.fanout import FirstCompleted, CollectAll

# 取最快返回的结果
result = await dispatcher.stt.fanout(
    ["cloud-stt", "local-stt"], audio, strategy=FirstCompleted()
)

# 收集所有结果对比
results = await dispatcher.stt.fanout(
    ["cloud-stt", "local-stt"], audio, strategy=CollectAll()
)
for name, t in results.successes.items():
    print(f"{name}: {t.text}")
```

### Result Filters

```yaml
providers:
  my-stt:
    provider: faster-whisper
    exec_mode: subprocess
    settings:
      model_size: base
    filters:
      - type: confidence
        min: 0.8              # 过滤低置信度结果
      - type: language
        allow: ["zh", "en"]   # 只保留中英文
```

### Observers（可观测性）

```python
from openspeechapi.observe.metrics import MetricsObserver
from openspeechapi.observe.debug import DebugLogObserver

dispatcher.add_observer(MetricsObserver())    # TTFB、耗时、吞吐
dispatcher.add_observer(DebugLogObserver())   # 详细日志
```

内置 5 个 Observer：`MetricsObserver` `LatencyObserver` `DebugLogObserver` `UsageObserver` `TracingObserver`

## 数据模型

```python
AudioData(data=bytes, sample_rate=int, channels=int, format=AudioFormat, duration_ms=int|None)
Transcription(text=str, language=str|None, confidence=float|None, words=list[Word]|None)
Word(text=str, start_ms=int, end_ms=int, confidence=float|None)
STTOptions(language=str|None, prompt=str|None, temperature=float|None)
TTSOptions(voice=str|None, speed=float, output_format=AudioFormat)
AudioFormat: PCM_16K | PCM_44K | WAV | AIFF | MP3 | OGG | FLAC | OPUS
```

## 项目结构

```
openspeechapi/
  core/           # L1: Provider 抽象层（models, enums, base, registry）
  providers/      # Provider 适配器（stt/ 5个含macos, tts/ 8个含macos）
  utils/           # 工具模块（audio_converter, audio_playback）
  dispatch/       # L2: 调度层（dispatcher, executors, fanout, filters）
  observe/        # 可观测性（metrics, latency, debug, usage, tracing）
  server/         # L3: FastAPI HTTP/WebSocket 服务
  client/         # Python 薄客户端
  factory.py      # create_provider() 工厂函数
  config.py       # YAML 配置加载
  cli.py          # openspeechapi list / check / serve
  demo.py         # 交互式 demo CLI
examples/         # 示例脚本（Library 模式 + Client 模式）
tests/            # 332 tests（unit + integration + E2E）
Dockerfile        # 容器化部署
docker-compose.yml
.github/workflows/ci.yml  # GitHub Actions CI
```

## 环境变量

| 变量 | 用途 |
|------|------|
| `OPENAI_API_KEY` | OpenAI STT/TTS 所需的 API Key |
| `DEEPGRAM_API_KEY` | Deepgram STT 所需的 API Key |
| `ELEVENLABS_API_KEY` | ElevenLabs TTS 所需的 API Key |
| `MINIMAX_API_KEY` | Minimax TTS 所需的 API Key |
| `OPENSPEECH_API_KEY` | HTTP 服务 Bearer token 认证 Key |

支持 `.env` 文件自动加载（需 `python-dotenv`）。

## 部署

**Docker:**
```bash
# 构建并启动
docker-compose up -d

# 查看日志
docker-compose logs -f

# GPU 支持（编辑 docker-compose.yml 取消注释 openspeechapi-gpu 服务）
```

**直接启动:**
```bash
openspeechapi serve --config providers.yaml --port 8600
```

## 认证

在 `providers.yaml` 中配置 API Key 认证：

```yaml
server:
  auth:
    enabled: true
    api_keys:
      - ${OPENSPEECH_API_KEY}
```

启用后所有 REST 请求需携带 Bearer token：
```bash
curl -H "Authorization: Bearer your-key" http://localhost:8600/v1/providers
```

WebSocket 通过查询参数传递：
```
ws://localhost:8600/v1/stt/stream?provider=deepgram&token=your-key
```

`/v1/health` 端点免认证。不配置 `server.auth` 则无认证（开发模式）。

## 流式 STT

Deepgram 支持实时流式转录：

```python
async with Client("http://localhost:8600") as c:
    async def audio_source():
        # 从麦克风或文件读取 PCM 音频块
        yield pcm_chunk

    async for transcription in c.stt.transcribe_stream("deepgram", audio_source()):
        print(transcription.text)
```

WebSocket 方式：
```
ws://localhost:8600/v1/stt/stream?provider=deepgram
# 发送: binary PCM audio frames
# 接收: {"type": "partial", "text": "..."}
```

## CI

项目使用 GitHub Actions 自动化测试。每次 push 到 main 或 PR 时自动运行：
- ruff lint
- 单元测试 + 集成测试
- 代码覆盖率检查（≥70%）

## License

Private — personal multi-project reuse.
