Metadata-Version: 2.4
Name: slidesonnet
Version: 0.1.0a1
Summary: Compile text-based presentations into narrated videos
Author: Aviv Zohar
License-Expression: MIT
Project-URL: Homepage, https://github.com/avivz/slideSonnet
Project-URL: Source, https://github.com/avivz/slideSonnet
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Education
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Video
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: doit>=0.36
Requires-Dist: python-dotenv>=1.0
Requires-Dist: playwright>=1.40
Requires-Dist: rich>=13.0
Provides-Extra: piper
Requires-Dist: piper-tts>=1.4; extra == "piper"
Provides-Extra: elevenlabs
Requires-Dist: elevenlabs>=1.0; extra == "elevenlabs"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: types-PyYAML; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="assets/logo/slidesonnet-logo.png" alt="SlideSonnet — Text → Video" width="500">
</p>

Compile text-based slide presentations into narrated MP4 videos.

Write your slides in [MARP](https://marp.app/) Markdown or LaTeX Beamer, add narration with `<!-- say: -->` comments, and slideSonnet handles TTS synthesis, video composition, and assembly — with incremental builds that only re-synthesize changed slides.

## How it works

```
slidesonnet.yaml (playlist)
    |
    ├── 01-intro/slides.md   → [parse → TTS → compose] → module_01.mp4
    ├── animations/euler.mp4  → [passthrough]            → module_02.mp4
    ├── 02-proofs/slides.tex  → [parse → TTS → compose] → module_03.mp4
    └── [assemble] ─────────────────────────────────────→ my-course.mp4
```

A **playlist** file chains modules together — MARP slides, Beamer slides, and pre-existing video files. Each module is built independently, then concatenated into the final video. [pydoit](https://pydoit.org/) manages the build graph with content-hash caching, so only changed slides trigger TTS.

## Installation

### External dependencies

Install these system packages first:

| Tool | Required? | What it does | Install |
|---|---|---|---|
| **ffmpeg** | Yes | Video composition and concatenation | `sudo apt install ffmpeg` |
| **marp-cli** | Yes (for MARP slides) | Converts Markdown slides to PNG images | `npm install -g @marp-team/marp-cli` |
| **pdflatex + pdftoppm** | Only for Beamer | Compiles LaTeX and extracts slide images | `sudo apt install texlive-latex-base poppler-utils` |

After installing, run `slidesonnet doctor` to verify everything is set up correctly.

### Install slideSonnet

With [uv](https://docs.astral.sh/uv/) (recommended):

```bash
uv tool install slidesonnet[piper]
```

With [pipx](https://pipx.pypa.io/):

```bash
pipx install slidesonnet[piper]
```

The `[piper]` extra includes [Piper TTS](https://github.com/rhasspy/piper) for free local speech synthesis. Omit it if you plan to use ElevenLabs instead.

## Quick start

```bash
# Create an example project (MARP Markdown)
slidesonnet init md myproject

# Build the video
cd myproject
slidesonnet build
```

## Example: The Basel Problem

A 10-minute narrated lecture on the Basel Problem, built entirely from a single `slides.md` file:

[![The Basel Problem — slideSonnet example](https://img.youtube.com/vi/IkzeCoDuU5o/maxresdefault.jpg)](https://youtu.be/IkzeCoDuU5o)

Source: [`examples/basel-problem/`](examples/basel-problem/)

## Showcase example

The `examples/showcase/` directory is a single-file MARP presentation introducing slideSonnet through a dialog between two voices. It demonstrates narration, fragment animation, voice switching, silent/skipped slides, math, code, and images — all in one `slides.md` file.

[![slideSonnet Showcase](https://img.youtube.com/vi/u6yOFujj_f0/maxresdefault.jpg)](https://youtu.be/u6yOFujj_f0)

Source: [`examples/showcase/`](examples/showcase/) — includes pronunciation dictionaries and a playlist with ElevenLabs and Piper voice configuration.

## Writing slides

### MARP Markdown

Add narration with `<!-- say: -->` HTML comments:

```markdown
---
marp: true
---

# Introduction

<!-- say: Welcome to the lecture. Today we cover graph theory basics. -->

---

# Euler's Theorem

<!-- say(voice=alice): Let me explain this theorem carefully. -->

---

# Diagram

<!-- nonarration -->

---

# Hidden Notes

<!-- skip -->
```

| Annotation | Effect |
|---|---|
| `<!-- say: text -->` | Narrate with default voice |
| `<!-- say(voice=alice): text -->` | Narrate with a named voice preset |
| `<!-- nonarration -->` | Show slide with silence (uses global `silence_duration`) |
| `<!-- nonarration(5) -->` | Show slide with silence for 5 seconds (per-slide override) |
| `<!-- skip -->` | Omit slide from video entirely |
| *(none)* | Treated as silent, emits a warning |

Multi-line narration is supported. Slides with multiple `<!-- say: -->` directives are expanded into animated sub-slides with progressive fragment reveal — see [MARP documentation](docs/marp.md) for details.

### Beamer LaTeX

Use the `\say` command (defined as a no-op by `slidesonnet.sty` so LaTeX compiles normally):

```latex
\usepackage{slidesonnet}

\begin{frame}
  \frametitle{Euler's Theorem}
  \say{The sum of all vertex degrees equals twice the number of edges.}
  \say[voice=alice]{Let me explain more carefully.}
\end{frame}
```

Beamer equivalents: `\say{}`, `\say[voice=alice]{}`, `\nonarration`, `\nonarration[5]` (per-slide duration override), `\slidesonnetskip`. Frames with `\pause` produce multiple sub-slides that can be narrated independently — see [Beamer documentation](docs/beamer.md) for details.

## Playlist format

A single `.yaml` file per presentation. Configuration and module list in pure YAML:

```yaml
title: Graph Theory Lecture 1
tts:
  backend: piper
  piper:
    model: en_US-lessac-medium
  elevenlabs:
    api_key_env: ELEVENLABS_API_KEY
    voice_id: pNInz6obpgDQGcFmaJgB
voices:
  alice:
    piper: en_US-amy-medium
    elevenlabs: 21m00Tcm4TlvDq8ikWAM
pronunciation:
  shared:
    - pronunciation/cs-terms.md
    - pronunciation/math-terms.md
  # piper:
  #   - pronunciation/piper-hacks.md
  # elevenlabs:
  #   - pronunciation/elevenlabs-hacks.md
video:
  resolution: 1920x1080
  fps: 24
  crf: 23
  pad_seconds: 1.5
  pre_silence: 1.0
  silence_duration: 3.0
  crossfade: 0.5
modules:
  - 01-intro/slides.md
  - animations/euler.mp4
  - 02-proofs/slides.tex
  - 03-summary/slides.md
```

- Module type is auto-detected from extension (`.md` → MARP, `.tex` → Beamer, `.mp4` / `.mkv` / `.webm` / `.mov` → video passthrough)
- Lines starting with `//` are comments (filtered before YAML parsing)
- Video files are used as-is

## Pronunciation files

Reusable `.md` files with `**word**: replacement` pairs:

```markdown
# CS Pronunciation Guide

## People
**Dijkstra**: DYKE-struh
**Euler**: OY-ler

## Terms
**adjacency**: uh-JAY-suhn-see
```

Replacements are word-boundary aware (won't change "Eulerian") and case-insensitive. Reference them in the playlist under `pronunciation:`.

### Per-backend pronunciation

Pronunciation workarounds that fix one TTS engine often break another. You can specify separate files per backend:

```yaml
pronunciation:
  shared:
    - pronunciation/names.md
  piper:
    - pronunciation/piper-hacks.md
  elevenlabs:
    - pronunciation/elevenlabs-hacks.md
```

When building with `--tts piper`, the effective dictionary is `shared + piper`. With `--tts elevenlabs`, it's `shared + elevenlabs`. Backend-specific entries override shared entries for the same word.

The flat list format still works and is treated as `shared`:

```yaml
pronunciation:
  - pronunciation/names.md
```

## Voice presets

Define named voices in the playlist. Each preset can map to different voice IDs per TTS backend, so `--tts piper` and `--tts elevenlabs` both resolve correctly:

```yaml
voices:
  alice:
    piper: en_US-amy-medium
    elevenlabs: 21m00Tcm4TlvDq8ikWAM
  bob:
    piper: en_US-joe-medium
    elevenlabs: pNInz6obpgDQGcFmaJgB
```

A simple string value is also supported — it is used as-is regardless of backend:

```yaml
voices:
  alice: en_US-amy-medium
```

Then use presets per-slide: `<!-- say(voice=alice): ... -->`. If a preset has no mapping for the active backend, the slide falls back to the default voice with a warning.

## API keys

For ElevenLabs, store keys in a `.env` file at the project root (auto-loaded at build time):

```
ELEVENLABS_API_KEY=sk-xxx-your-key
```

The playlist references env var names, never values: `api_key_env: ELEVENLABS_API_KEY`.

## CLI reference

```
slidesonnet build                          # build video + SRT subtitles
slidesonnet build --tts piper              # override TTS backend
slidesonnet build --no-srt                 # build without generating subtitles
slidesonnet build --dry-run                # show what would be built (no TTS/FFmpeg)
slidesonnet preview                        # quick build with local Piper TTS
slidesonnet subtitles                      # regenerate SRT from cached audio
slidesonnet preview-slide slides.md 3       # play one slide's audio
slidesonnet preview-slide slides.md 3 -p slidesonnet.yaml  # with playlist config
slidesonnet init md myproject               # MARP Markdown project
slidesonnet init tex myproject              # Beamer LaTeX project
slidesonnet list                           # list slides with cache status per slide
slidesonnet utterances                     # export narration text for proofreading
slidesonnet clean                          # clean cache (keeps API audio by default)
slidesonnet doctor                         # check installed dependencies
```

## Incremental builds

TTS audio is cached by content hash of the narration text, not by slide number. This means:

- **No changes** → entire build is skipped
- **Edit one slide** → only that slide's audio is re-synthesized
- **Insert a slide** → existing slides hit the cache, only the new slide triggers TTS
- **Change voice preset** → affected slides rebuild (voice is part of the hash)

Use `--dry-run` (or `-n`) to see what a build would do without making any API calls:

```
$ slidesonnet build --dry-run
8 narrated slides: 5 cached, 3 need TTS (~1,200 characters via elevenlabs)
```

This is especially useful before ElevenLabs builds to estimate API usage and cost.

Build artifacts live in `cache/` next to the playlist file. Add it to `.gitignore`.

## Subtitles

Every build automatically generates an SRT subtitle file alongside the video (e.g., `my-course.srt` next to `my-course.mp4`). The subtitles use the original narration text (before pronunciation substitutions) and are timed to match the audio.

Long narrations are split into subtitle-sized chunks at sentence boundaries, then clause boundaries, then word boundaries — each chunk timed proportionally by character count.

Use the SRT file as a starting point for translation or editing with any subtitle tool. To skip generation, pass `--no-srt`. To regenerate from cache without rebuilding:

```
slidesonnet subtitles
```

## Project layout

```
my-course/
├── slidesonnet.yaml           # playlist + config
├── pronunciation/
│   └── cs-terms.md
├── 01-intro/slides.md        # MARP module
├── 02-proofs/slides.tex      # Beamer module
├── animations/euler.mp4      # video module
├── .env                      # API keys (gitignored)
├── my-course.mp4             # final output video
├── my-course.srt             # auto-generated subtitles
├── cache/                    # build artifacts (gitignored)
│   ├── audio/                # TTS cache (content-addressed)
│   ├── 01-intro/
│   │   ├── slides/           # extracted PNGs + manifest
│   │   ├── utterances/       # text sent to TTS (for debugging)
│   │   └── segments/         # per-slide video segments
│   └── .doit.db
└── .gitignore
```

## Development

```bash
git clone https://github.com/avivz/slideSonnet.git
cd slideSonnet
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[piper,dev]"

make test-unit     # unit tests only (fast, no external tools)
make test          # all tests (requires ffmpeg, marp, pdflatex, piper)
make lint          # ruff check + format
make typecheck     # mypy --strict
```

## License

MIT
