Metadata-Version: 2.4
Name: lore-ai
Version: 0.1.2
Summary: Privacy-first, local-only oral history transcription.
Author: Digital Heritage Lab
License: MIT
Project-URL: Homepage, https://github.com/mabo-du/lore
Project-URL: Documentation, https://github.com/mabo-du/lore/blob/main/USER_GUIDE.md
Project-URL: Repository, https://github.com/mabo-du/lore.git
Project-URL: Issue Tracker, https://github.com/mabo-du/lore/issues
Keywords: oral-history,transcription,whisper,archival,offline
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: faster-whisper
Requires-Dist: PyQt6
Requires-Dist: imageio-ffmpeg
Requires-Dist: platformdirs
Requires-Dist: lxml
Requires-Dist: numpy
Requires-Dist: tsdownsample
Requires-Dist: pyannote-audio==3.3.2
Requires-Dist: resemblyzer>=0.1.4
Requires-Dist: gliner2-onnx>=0.1.1
Requires-Dist: llama-cpp-python>=0.3.25
Requires-Dist: sqlite-vec>=0.1.9
Requires-Dist: fastembed>=0.8.0
Requires-Dist: cryptography>=42.0
Provides-Extra: diarization
Requires-Dist: pyannote-audio; extra == "diarization"
Requires-Dist: Resemblyzer; extra == "diarization"
Dynamic: license-file

<div align="center">
  <img src="https://img.shields.io/badge/Local_First-100%25-brightgreen.svg?style=for-the-badge" alt="Local First">
  <img src="https://img.shields.io/badge/Python-3.12+-blue.svg?style=for-the-badge&logo=python" alt="Python 3.12+">
  <img src="https://img.shields.io/badge/PyQt6-UI-blueviolet.svg?style=for-the-badge" alt="PyQt6">
  <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge" alt="MIT License">
  <img src="https://img.shields.io/github/actions/workflow/status/mabo-du/lore/ci.yml?branch=main&style=for-the-badge" alt="CI">
  <img src="https://img.shields.io/pypi/v/lore-ai.svg?style=for-the-badge" alt="PyPI">

  <h1>Lore 🎙️</h1>

  <p><strong>Privacy-First, Local-Only Oral History Transcription & Archiving</strong></p>
</div>

---

**Lore** is a desktop application designed for historians, archivists, and researchers. It provides state-of-the-art AI transcription, speaker diarization, named entity recognition, and translation—**100% offline, on your own hardware.**

No data leaves your computer. No cloud subscriptions. Just powerful, open-source AI packaged into a clean, intuitive PyQt6 interface.

<img src="https://raw.githubusercontent.com/mabo-du/lore/main/docs/images/lore_main.png" alt="Lore Main Window" width="800">

## ✨ Features

- 🎧 **Offline Transcription:** Powered by `faster-whisper`, optimized for CPU inference with low memory overhead (< 8GB RAM).
- 🗣️ **Speaker Diarization:** Automatically identifies and labels different speakers using `pyannote.audio`.
- 🔍 **Word-Level Confidence:** Low-confidence words are visually highlighted so you can quickly spot potential hallucinations.
- 🌍 **Local Translation:** Translate transcripts to over 200 languages completely offline using Meta's `NLLB-200` model.
- 📖 **Custom Vocabulary:** Provide local jargon, proper nouns, and historical terms to guide Whisper's decoding graph for maximum accuracy.
- 🏷️ **Named Entity Recognition:** Uses `GLiNER` to automatically extract people, organizations, dates, and locations.
- 📦 **Archival Exporting:** Export your work to the **OHMS XML** format or create an **RFC 8493 BagIt** archival package with SHA-256 checksum verification.
- 🔎 **Global Archive Search:** A unified SQLite database (`FTS5` + `sqlite-vec`) lets you instantly search across all your past projects using keyword or semantic/conceptual search.

## 🚀 Installation

### Option 1: Pre-built Installers (Recommended)

Download the installer for your platform from the [latest release](https://github.com/mabo-du/lore/releases/latest):

| Platform | Installer |
|----------|-----------|
| 🪟 **Windows** | `lore-windows-x86_64.zip` — Extract and run `lore.exe` |
| 🍎 **macOS** | `lore-macos-arm64.tar.gz` — Extract and run `lore` |
| 🐧 **Linux** | `lore-linux-x86_64.tar.gz` — Extract and run `lore` |

### Option 2: Install from PyPI

```bash
pip install lore-ai
lore
```

### Option 3: Install from Source

Lore requires **Python 3.12+** and is cross-platform (Windows, macOS, Linux).

1. **Clone the repository:**
   ```bash
   git clone https://github.com/mabo-du/lore.git
   cd lore
   ```

2. **Set up a virtual environment:**
   ```bash
   python -m venv .venv
   source .venv/bin/activate  # On Windows: .venv\Scripts\activate
   ```

3. **Install the application:**
   ```bash
   pip install -e .
   ```

## 🎮 Usage

Start the Lore application:

```bash
lore
```

Or launch from your system's application menu if installed via the pre-built installer.

1. **Select an Audio File:** Click "Browse" to select any standard audio format (WAV, MP3, M4A, OGG, FLAC).
2. **Configure Settings:** Click the ⚙️ Settings icon to set your Custom Vocabulary and speaker diarization preferences.

   <img src="https://raw.githubusercontent.com/mabo-du/lore/main/docs/images/lore_settings.png" alt="Lore Settings" width="400">

3. **Transcribe & Diarize:** Click "Transcribe" on the toolbar. If recording has multiple speakers, check the "Enable Speaker Diarization" box.
4. **Edit & Review:** Play the audio, click on segments to edit them, and review any low-confidence words highlighted in red.
5. **Translate:** Select a target language from the dropdown and click "Translate" for fully offline translation.
6. **Export:** Fill out the Metadata panel and export to **OHMS XML** or an Archival **BagIt Package**.

For detailed instructions, see the [User Guide](USER_GUIDE.md).

## 🏗️ Architecture

Lore is designed with strict sequential memory management to run on older hardware.
- Models are loaded into memory one at a time (e.g., Whisper loads, transcribes, unloads → NLLB loads, translates, unloads).
- Heavy use of CTranslate2 (INT8 quantization) ensures models run blazingly fast without needing a dedicated GPU.
- The UI runs asynchronously using PyQt6's `QThread` and Signals, keeping the interface completely responsive during heavy AI workloads.

## 🤝 Contributing

Lore is an open-source project. We welcome pull requests, bug reports, and feature requests. Please see our [User Guide](USER_GUIDE.md) for more detailed workflows and documentation on the codebase.

## 📜 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
