Metadata-Version: 2.4
Name: saymo
Version: 0.10.0
Summary: Local AI voice assistant for macOS — speaks in your cloned voice on live calls, fully on-device
Author: Mikhail Shchegolev
License: MIT
Project-URL: Homepage, https://github.com/mshegolev/saymo
Project-URL: Repository, https://github.com/mshegolev/saymo
Project-URL: Issues, https://github.com/mshegolev/saymo/issues
Project-URL: Documentation, https://github.com/mshegolev/saymo#readme
Keywords: voice-assistant,voice-cloning,local-ai,ollama,xtts,qwen3-tts,tts,stt,llm,agent,macos
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: sounddevice>=0.5.0
Requires-Dist: soundfile>=0.12.0
Requires-Dist: numpy
Requires-Dist: click>=8.1.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pynput>=1.7.6
Requires-Dist: pyyaml>=6.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: jira>=3.8.0
Provides-Extra: tts
Requires-Dist: torch>=2.6.0; extra == "tts"
Requires-Dist: torchaudio; extra == "tts"
Requires-Dist: coqui-tts[codec]; extra == "tts"
Requires-Dist: transformers<5,>=4.46; extra == "tts"
Requires-Dist: piper-tts; extra == "tts"
Requires-Dist: scipy; extra == "tts"
Provides-Extra: apple
Requires-Dist: mlx>=0.22.0; extra == "apple"
Requires-Dist: mlx-lm>=0.21.0; extra == "apple"
Requires-Dist: mlx-audio>=0.1.0; extra == "apple"
Provides-Extra: stt
Requires-Dist: faster-whisper>=1.0.0; extra == "stt"
Provides-Extra: cloud
Requires-Dist: anthropic>=0.43.0; extra == "cloud"
Requires-Dist: openai>=1.57.0; extra == "cloud"
Requires-Dist: deepgram-sdk>=3.0.0; extra == "cloud"
Requires-Dist: aiohttp>=3.9.0; extra == "cloud"
Provides-Extra: all
Requires-Dist: saymo[apple,cloud,stt,tts]; extra == "all"
Dynamic: license-file

# Saymo — Local AI Voice Assistant

Fully local AI voice assistant for macOS. Speaks into any live call in **your cloned voice** — no cloud APIs required.

Saymo composes short, natural speech from optional data sources (tracker, notes, text files), synthesizes it with voice cloning, and routes audio into the active call through a virtual microphone. Everything — language model, speech-to-text, text-to-speech — runs on-device.

- **Local:** Ollama + faster-whisper + Coqui XTTS v2 (or Piper / macOS `say` as fallback).
- **Voice cloning:** 5-minute sample → your voice, fine-tuning optional.
- **Routing:** BlackHole virtual mic → any browser-based call app.
- **Call automation:** Chrome-driven mute/unmute for 8 providers (Glip, Zoom, Google Meet, MS Teams, Telegram, Yandex Telemost, VK Teams, MTS Link).
- **Listening mode:** auto-detects when your name is called, answers questions from provided context.
- **User-configurable prompts and vocabulary** — no source edits required.

> **Project status:** early public alpha. Expect rough edges. Contributions welcome.

---

## Requirements

- macOS with Apple Silicon (M1/M2/M3/M4), **arm64 terminal, not Rosetta**
- Python 3.11+
- Homebrew
- Google Chrome
- ~10 GB free disk space

## Quick install

```bash
git clone https://github.com/mshegolev/saymo && cd saymo
cp config.example.yaml config.yaml   # fill in your details
./install.sh
```

The installer handles brew deps, Python packages (via `uv` or `pip`), an Ollama check, a Piper voice model, and Chrome permissions.

## First-time setup

```bash
saymo setup                        # Interactive wizard: name, devices, profiles
saymo record-voice -d 300          # Record a 5-minute voice sample
saymo test-devices                 # Verify audio devices
saymo test-tts "Привет, это тест"  # Check that TTS works
```

### One-time audio routing

```
┌─────────────────────────────────────────────────────────────┐
│                   Audio MIDI Setup                          │
│  Create "Multi-Output Device":                              │
│    ✓ Your headphones   (master, no drift correction)        │
│    ✓ BlackHole 16ch    (drift correction ON)                │
│                                                             │
│  In your call app:                                          │
│    Microphone → BlackHole 2ch                               │
│    Speakers   → Multi-Output Device                         │
└─────────────────────────────────────────────────────────────┘
```

## Daily usage

```bash
# Before the call: prepare text + cached audio
saymo prepare -p personal
saymo prepare-responses         # pre-synthesize the Q&A library for live mode
saymo review                    # optional: check generated audio

# During the call
saymo speak -p personal         # manual trigger, instant playback
saymo auto -p personal          # listen for your name, speak when called
saymo auto -p personal --mic    # same, but from laptop mic (for testing)

# Extras
saymo dashboard                 # interactive TUI
```

### Call providers

`saymo auto` works with all Chrome-based call apps — the provider is
picked by `meetings.<profile>.provider` in config:

| `provider:` | Service |
|---|---|
| `glip` (default) | RingCentral Glip |
| `zoom` | Zoom |
| `google_meet` | Google Meet |
| `ms_teams` | Microsoft Teams |
| `telegram` | Telegram calls (web) |
| `telemost` | Yandex Telemost |
| `vk_teams` | VK Teams |
| `mts_link` | MTS Link |

Run `saymo list-plugins` to see everything available in your install.

### Live Q&A mode

When your name is called and the surrounding transcript looks like a
question, `auto` consults a **pre-synthesised response library** and plays
the best-matching cached variant — no network hop, no synthesis lag.
Populate the library once with `saymo prepare-responses`. Built-in
intents cover status (`как дела`), blockers, ETA, testing stage, review.
Extend with your own wording via `config.responses.library`.

On cache miss, you can opt into a **live fallback**: Ollama composes an
answer from your standup summary + JIRA context, the TTS engine
synthesizes it, and Saymo plays it back. This adds a few seconds of
latency but covers any question. Enable it in config:

```yaml
responses:
  live_fallback: true
```

Without `live_fallback` (default), a cache miss falls back to the
generic standup audio — quiet, reliable, no LLM dependency.

## Configurable prompts

All LLM prompts are templates loaded from `config.yaml` → `prompts.*` at runtime, with sensible generic defaults in source. To customize voice/tone:

```yaml
prompts:
  standup_ru: |
    Ты — помощник для ежедневных встреч. Составь отчёт на русском...
    {yesterday_notes}
    {today_notes}
  qa_system_ru: |
    Ты — {user_name}, {user_role}. Отвечай кратко, 1-3 предложения...
```

See `config.example.yaml` for all available keys and the default set.

## Project-specific vocabulary

Adding your own abbreviations or fuzzy name expansions to the TTS normalizer is done through config, not source:

```yaml
vocabulary:
  abbreviations:
    MYAPI: "май-эй-пи-ай"
    K8S: "кубернетес"
  fuzzy_expansions:
    Alex: ["Alex", "Алекс", "Саша", "Саня"]
```

## Architecture

```
┌───────────────┐   ┌──────────────┐   ┌────────────────┐   ┌──────────────┐
│ Source plugin │──▶│ LLM composer │──▶│ Text normalizer│──▶│  TTS engine  │
│  (optional)   │   │   (Ollama)   │   │   (abbrevs,    │   │  (XTTS clone │
│               │   │              │   │    numbers)    │   │  / Piper)    │
└───────────────┘   └──────────────┘   └────────────────┘   └──────┬───────┘
                                                                   │
┌──────────────┐   ┌──────────────┐   ┌────────────────┐           │
│Call provider │◀──│ Auto trigger │◀──│  STT (Whisper) │       Audio bytes
│(mute/unmute) │   │(name detect) │   │ (capture call) │           │
└──────┬───────┘   └──────────────┘   └────────────────┘           │
       │                                                           │
       ▼                                                           ▼
  BlackHole 2ch ─────────────────────────────────────────── Audio output + monitor
  (virtual mic)
```

Details in `docs/PRD.md` and ADRs under `docs/adr/`.

## Security & privacy

- Everything runs on-device by default. Cloud TTS / STT providers are optional and disabled in the example config.
- Voice samples and secrets are listed in `.gitignore` — they never leave your machine.
- Prompts, vocabulary, trigger phrases are all in your config file — source stays generic.

## License

MIT — see [`LICENSE`](LICENSE).

## Acknowledgements

- [Coqui TTS](https://github.com/coqui-ai/TTS) for XTTS v2.
- [Ollama](https://ollama.com) for local LLM hosting.
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper) for transcription.
- [BlackHole](https://existential.audio/blackhole/) for virtual audio routing.
