Metadata-Version: 2.4
Name: talking-snake
Version: 0.1.0
Summary: Just a talking snake that reads PDFs and web pages aloud.
Author: Luca
License: MIT
License-File: LICENSE
Keywords: audiobook,listening,pdf,speech,text-to-speech,tts
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.11
Requires-Dist: fastapi>=0.115.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: jinja2>=3.1.4
Requires-Dist: pdfminer-six>=20260107
Requires-Dist: python-multipart>=0.0.12
Requires-Dist: qwen-tts>=0.1.1
Requires-Dist: torch>=2.5.0
Requires-Dist: trafilatura>=2.0.0
Requires-Dist: uvicorn[standard]>=0.32.0
Provides-Extra: dev
Requires-Dist: httpx>=0.27.0; extra == 'dev'
Requires-Dist: mypy>=1.14.0; extra == 'dev'
Requires-Dist: pre-commit>=4.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
Requires-Dist: pytest>=8.3.0; extra == 'dev'
Requires-Dist: ruff>=0.8.0; extra == 'dev'
Provides-Extra: fast
Requires-Dist: flash-attn>=2.5.0; extra == 'fast'
Description-Content-Type: text/markdown

# Talking Snake

<img src="https://raw.githubusercontent.com/LucaCappelletti94/talking-snake/main/talking_snake.png" alt="Talking Snake" width="400">

[![CI](https://github.com/LucaCappelletti94/talking-snake/actions/workflows/ci.yml/badge.svg)](https://github.com/LucaCappelletti94/talking-snake/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/LucaCappelletti94/talking-snake/branch/main/graph/badge.svg)](https://codecov.io/gh/LucaCappelletti94/talking-snake)

PDF and web page to speech using [Qwen3-TTS](https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice). Upload a document or URL, get it read aloud with 9 natural voices across English, Chinese, Japanese, and Korean. Audio streams progressively while generation continues.

## Deploy Your Own

[![Deploy on Hugging Face Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/deploy-on-spaces-lg.svg)](https://huggingface.co/spaces/LucaCappelletti94/talking-snake?duplicate=true)

Click the button above to deploy your own GPU-powered instance. You'll be prompted to create a Hugging Face account and select hardware (L4 or A100 recommended for speed, ~$0.80-$4/hr).

## Run Locally

Requires Python 3.11+, NVIDIA GPU (~6GB VRAM), and [SoX](https://sourceforge.net/projects/sox/) (`apt install sox libsox-dev`).

```bash
uv sync && uv run talking-snake --port 8888  # Open http://localhost:8888
```


[▶️ Listen to a sample](https://github.com/LucaCappelletti94/talking-snake/raw/main/src/talking_snake/static/sample.wav)

The website looks like this:

<img src="https://raw.githubusercontent.com/LucaCappelletti94/talking-snake/main/landing.png" alt="Upload interface" width="400">
<img src="https://raw.githubusercontent.com/LucaCappelletti94/talking-snake/main/rendering.png" alt="Audio playback with progress" width="400">

## License

This project is licensed under the [MIT License](LICENSE). Dependencies and third-party components (e.g., Qwen3-TTS, SoX) are subject to their own licenses.
