Metadata-Version: 2.4
Name: lyric-sync
Version: 1.0.0
Summary: Audio/lyric synchronisation — dual transcription, word-level alignment, gap detection, timeline validation
Author-email: Trollfabriken AITrix AB <dev@trollfabriken.se>
License: Proprietary
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.5
Requires-Dist: python-dotenv>=1.0
Provides-Extra: openai
Requires-Dist: openai>=1.40; extra == "openai"
Provides-Extra: elevenlabs
Requires-Dist: httpx>=0.27; extra == "elevenlabs"
Provides-Extra: audio
Requires-Dist: pydub>=0.25; extra == "audio"
Provides-Extra: all
Requires-Dist: openai>=1.40; extra == "all"
Requires-Dist: httpx>=0.27; extra == "all"
Requires-Dist: pydub>=0.25; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"

# lyric-sync

Audio/lyric synchronisation for the **AIMOScript** video generation pipeline. Built from the real-world debugging of the MusicVideoCreator project — specifically designed to eliminate the negative durations, backwards timestamps, and reused-word problems from the earlier implementation.

---

## What it solves

| Previous problem | Solution |
|---|---|
| Negative durations (`-42.37s`) | `TimelineValidator` 5-pass repair |
| Backwards timestamps (end before start) | `ExclusionPoolMatcher` prevents word reuse |
| Same transcription words matched to multiple lines | Exclusion pool — once a word is used, it's gone |
| Swedish encoding garbage (`Ã¤` instead of `ä`) | `fix_swedish_encoding()` in all text paths |
| ElevenLabs timestamps ignored in favour of fixed durations | Consensus builder uses ElevenLabs timing as ground truth |
| Excessive instrumental segments | `min_gap_duration` threshold (default 1.5s) |

---

## Installation

```bash
pip install lyric-sync                      # core only
pip install "lyric-sync[openai]"            # + Whisper transcription
pip install "lyric-sync[elevenlabs]"        # + ElevenLabs Scribe transcription
pip install "lyric-sync[all]"               # all providers + pydub
```

Set API keys:

```bash
OPENAI_API_KEY=sk-...
ELEVENLABS_API_KEY=...
```

---

## Quick start

```python
from lyric_sync import LyricSyncer

syncer = LyricSyncer()
result = syncer.sync(
    audio_path  = "conny.mp3",
    lyrics_path = "conny.txt",
)

for seg in result.segments:
    print(f"{seg.start_time:.2f}–{seg.end_time:.2f}  {seg.text}")

# Export for video renderer
from lyric_sync.exporter.json_exporter import JSONExporter
JSONExporter().export(result, "output/conny/")
```

---

## The pipeline

```
audio + lyrics
      │
  ① Transcribe
      ├─ OpenAI Whisper (verbose_json, word timestamps)
      └─ ElevenLabs Scribe (word timestamps, 99 languages)
      │
  ② Consensus merge
      └─ ElevenLabs preferred for Swedish; fills gaps from Whisper
      │
  ③ Align (ExclusionPoolMatcher)
      └─ Sliding window fuzzy match, exclusion pool prevents reuse
      │
  ④ Interpolate missing
      └─ Linear interpolation between anchors for unmatched lines
      │
  ⑤ Detect instrumental gaps
      └─ [Intro] / [Instrumental] / [Outro] for gaps ≥ min_gap
      │
  ⑥ Validate timeline
      └─ Fix negatives, overlaps, enforce min duration, redistribute
      │
   SyncResult → JSON
```

---

## Configuration

```python
from lyric_sync import LyricSyncer, SyncConfig

config = SyncConfig(
    # Transcription
    use_openai             = True,
    use_elevenlabs         = True,
    openai_model           = "whisper-1",
    language               = "sv",          # ISO 639-1; Swedish default
    prefer_elevenlabs      = True,          # ElevenLabs timing = ground truth

    # Alignment
    match_min_confidence   = 0.55,          # minimum word-match ratio
    word_similarity_threshold = 0.70,       # fuzzy word similarity threshold

    # Gap detection
    min_gap_duration       = 1.5,           # seconds; shorter gaps ignored
    min_instrumental_duration = 1.0,

    # Validation
    fix_negative_durations = True,
    fix_overlaps           = True,
    redistribute_on_violation = True,
    min_segment_duration   = 0.5,
)

syncer = LyricSyncer(config=config)
```

---

## Output format

`video_project_final.json` (compatible with MusicVideoCreator / CineForge):

```json
{
  "version": "1.0",
  "song_name": "Conny",
  "audio_duration": 195.3,
  "language": "sv",
  "stats": {
    "segment_count": 28,
    "lyric_count": 22,
    "instrumental_count": 6,
    "interpolated_count": 2,
    "mean_confidence": 0.847,
    "redistributed_count": 0
  },
  "segments": [
    {
      "index": 0,
      "text": "[Intro]",
      "start_time": 0.0,
      "end_time": 3.13,
      "duration": 3.13,
      "has_lyrics": false,
      "segment_type": "intro",
      "confidence": 1.0,
      "is_interpolated": false
    },
    {
      "index": 1,
      "text": "Min handledare hette Conny han var rak som ett vattenpass",
      "start_time": 3.13,
      "end_time": 8.119,
      "duration": 4.989,
      "has_lyrics": true,
      "segment_type": "lyric",
      "confidence": 0.982,
      "is_interpolated": false
    }
  ]
}
```

---

## Testing without API calls

```python
from lyric_sync import LyricSyncer
from lyric_sync.models import TimedWord

words = [
    TimedWord(word="Min", start=3.13, end=3.5),
    TimedWord(word="handledare", start=3.6, end=4.2),
    TimedWord(word="hette", start=4.3, end=4.7),
    TimedWord(word="Conny", start=4.8, end=5.4),
]

lines = ["Min handledare hette Conny"]

result = LyricSyncer().sync_from_text(words, lines, audio_duration=60.0)
print(result.segments[0].start_time)   # 3.13
print(result.segments[0].end_time)     # 5.4
```

---

## CLI

```bash
lyric-sync conny.mp3 --lyrics conny.txt --output output/conny/ --language sv
lyric-sync conny.mp3 --lyrics conny.txt --no-openai   # ElevenLabs only
lyric-sync conny.mp3 --lyrics conny.txt --min-gap 2.0 --verbose
```

---

## Package structure

```
lyric_sync/
├── __init__.py                   ← LyricSyncer + re-exports
├── syncer.py                     ← Main pipeline orchestrator
├── models.py                     ← TimedWord, LyricSegment, SyncResult, SyncConfig
├── utils.py                      ← Swedish encoding fix, normalise, fuzzy match
├── cli.py                        ← lyric-sync CLI
├── transcriber/
│   ├── openai_transcriber.py     ← Whisper word timestamps
│   ├── elevenlabs_transcriber.py ← Scribe word timestamps
│   └── consensus.py              ← Merge two transcriptions
├── aligner/
│   ├── exclusion_pool.py         ← Core word-matching engine
│   └── aligner.py                ← Align lines + interpolation
├── detector/
│   └── gap_detector.py           ← Intro/Instrumental/Outro detection
├── validator/
│   └── timeline_validator.py     ← Fix negatives, overlaps, redistribute
└── exporter/
    └── json_exporter.py          ← video_project_final.json
```

---

© Trollfabriken AITrix AB — Proprietary
