Metadata-Version: 2.4
Name: dtelecom-stt
Version: 0.2.0
Summary: Python SDK for dTelecom real-time speech-to-text with x402 micropayments
Project-URL: Homepage, https://github.com/dTelecom/stt-client-python
Project-URL: Repository, https://github.com/dTelecom/stt-client-python
Project-URL: Issues, https://github.com/dTelecom/stt-client-python/issues
Author-email: dTelecom <dev@dtelecom.org>
License-Expression: MIT
License-File: LICENSE
Keywords: micropayments,realtime,speech-to-text,stt,x402
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Requires-Dist: eth-account>=0.10.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: solders>=0.21.0
Requires-Dist: websockets>=12.0
Requires-Dist: x402[evm,httpx,svm]>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# dtelecom-stt

Python SDK for dTelecom real-time speech-to-text with [x402](https://www.x402.org/) micropayments.

Pay-per-minute STT powered by Whisper and Parakeet, with automatic blockchain payments via USDC on Base or Solana.

## Install

```bash
pip install dtelecom-stt
```

## Quick Start

```python
import asyncio
from dtelecom_stt import STTClient

async def main():
    # EVM wallet (Base)
    client = STTClient(private_key="0x...")
    # Or Solana wallet — detected automatically
    # client = STTClient(private_key="base58...")

    async with client.session(minutes=5, language="en") as stream:
        async for t in stream.transcribe_file("meeting.wav"):
            print(f"[{t.start:.1f}s] {t.text}")

asyncio.run(main())
```

## Real-Time Streaming

```python
import asyncio
from dtelecom_stt import STTClient

async def main():
    client = STTClient(private_key="0x...")  # or Solana base58 key

    async with client.session(minutes=5, language="en") as stream:
        # Callback-based
        stream.on_transcription(lambda t: print(t.text))

        # Send audio chunks (PCM16, 16kHz, mono)
        await stream.send_audio(pcm_bytes)

        # Or iterate asynchronously
        async for t in stream.transcriptions():
            print(t.text)

asyncio.run(main())
```

## Auto-Extend Sessions

Sessions automatically buy more time when running low (enabled by default):

```python
# 30-minute session that auto-extends
async with client.session(minutes=30, language="en") as stream:
    # When <60s remaining, SDK buys 5 more minutes automatically
    async for t in stream.transcriptions():
        print(t.text)

# Disable auto-extend
async with client.session(minutes=5, auto_extend=False) as stream:
    ...
```

## Audio Format

The server expects **PCM16, 16kHz, mono** audio. Convert with ffmpeg:

```bash
ffmpeg -i input.mp3 -ar 16000 -ac 1 -acodec pcm_s16le output.wav
```

## Pricing

```python
info = await client.pricing()
print(f"${info.price_per_minute_usd}/min ({info.currency} on {info.network})")
```

Current pricing: **$0.005/min** (USDC on Base or Solana).

## API Reference

### `STTClient(private_key, url=None)`

Main client. Default URL: `https://x402stt.dtelecom.org`.

- **EVM key** (hex, 0x-prefixed): pays with USDC on Base
- **Solana key** (base58): pays with USDC on Solana

- `session(minutes=5, language="en", auto_extend=True)` — Create a paid session (async context manager)
- `pricing()` — Get pricing info
- `health()` — Check server health

### `Stream`

Returned by `client.session()`. Async context manager.

- `send_audio(data: bytes)` — Send raw PCM16 audio
- `transcriptions()` — Async iterator of `Transcription` objects
- `transcribe_file(path)` — Stream a WAV file and yield transcriptions
- `on_transcription(callback)` — Register callback for transcriptions
- `close()` — Close the stream

### `Transcription`

- `text: str` — Transcribed text
- `start: float | None` — Start time in seconds
- `end: float | None` — End time in seconds
- `confidence: float | None` — Confidence score
- `is_final: bool` — Whether this is a final transcription

## Supported Languages

25 languages via Parakeet-TDT (fast) with Whisper fallback:

English, Russian, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Romanian, Hungarian, Greek, Turkish, Ukrainian, Swedish, Norwegian, Danish, Finnish, Catalan, Croatian, Lithuanian, Slovenian, Latvian, Estonian.

## Error Handling

```python
from dtelecom_stt import PaymentError, SessionExpiredError, ConnectionError

try:
    async with client.session(minutes=5) as stream:
        async for t in stream.transcriptions():
            print(t.text)
except PaymentError:
    print("Payment failed — check wallet balance")
except SessionExpiredError:
    print("Session time ran out")
except ConnectionError:
    print("Cannot connect to server")
```

## License

MIT
