Metadata-Version: 2.4
Name: audiorag
Version: 0.3.1
Summary: RAG over audio files with provider-agnostic pipeline
Author: Atharva Verma
Author-email: Atharva Verma <atharva.verma18@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: pydantic>=2.0
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: aiosqlite>=0.19
Requires-Dist: tenacity>=8.0
Requires-Dist: structlog>=24.0
Requires-Dist: textual>=0.47.0
Requires-Dist: rich>=13.0.0
Requires-Dist: questionary>=2.0.0
Requires-Dist: openai>=1.0 ; extra == 'all'
Requires-Dist: chromadb>=0.4 ; extra == 'all'
Requires-Dist: yt-dlp>=2024.0 ; extra == 'all'
Requires-Dist: pydub>=0.25 ; extra == 'all'
Requires-Dist: cohere>=5.0 ; extra == 'all'
Requires-Dist: pinecone-client>=2.0 ; extra == 'all'
Requires-Dist: weaviate-client>=3.0 ; extra == 'all'
Requires-Dist: supabase>=1.0 ; extra == 'all'
Requires-Dist: anthropic>=0.18 ; extra == 'all'
Requires-Dist: google-generativeai>=0.3 ; extra == 'all'
Requires-Dist: voyageai>=0.2 ; extra == 'all'
Requires-Dist: groq>=0.4 ; extra == 'all'
Requires-Dist: assemblyai>=0.51 ; extra == 'all'
Requires-Dist: chromadb>=0.4 ; extra == 'chromadb'
Requires-Dist: cohere>=5.0 ; extra == 'cohere'
Requires-Dist: yt-dlp>=2024.0 ; extra == 'defaults'
Requires-Dist: pydub>=0.25 ; extra == 'defaults'
Requires-Dist: openai>=1.0 ; extra == 'openai'
Requires-Dist: yt-dlp>=2024.0 ; extra == 'scraping'
Requires-Dist: pydub>=0.25 ; extra == 'scraping'
Maintainer: Atharva Verma
Maintainer-email: Atharva Verma <atharva.verma18@gmail.com>
Requires-Python: >=3.12
Project-URL: Homepage, https://github.com/atharva-again/audiorag
Project-URL: Repository, https://github.com/atharva-again/audiorag
Project-URL: Issues, https://github.com/atharva-again/audiorag/issues
Provides-Extra: all
Provides-Extra: chromadb
Provides-Extra: cohere
Provides-Extra: defaults
Provides-Extra: openai
Provides-Extra: scraping
Description-Content-Type: text/markdown

# AudioRAG

Provider-agnostic RAG pipeline for audio content. Download, transcribe, chunk, embed, and search audio from YouTube and other sources.

## Features

- **Multi-provider support**: OpenAI, Deepgram, AssemblyAI, Groq (STT); OpenAI, Voyage, Cohere (embeddings); OpenAI, Anthropic, Gemini (generation); ChromaDB, Pinecone, Weaviate, Supabase (vector stores)
- **Batch indexing**: Index multiple URLs, playlists, and local directories in one command
- **Source discovery**: Automatically expand playlists and recursively scan directories
- **Resumable processing**: SQLite state tracking with hash-based IDs
- **Proactive budget governor**: Optional fail-fast limits for RPM, TPM, and audio-seconds/hour
- **Atomic vector verification**: Optional post-write verification with strict or best-effort modes
- **Automatic chunking**: Time-based segmentation with configurable duration
- **Audio splitting**: Handles large files by splitting before transcription
- **Structured logging**: Context-aware logging with operation timing
- **Type-safe**: Python 3.12+ with full type annotations

## Quick Start

```python
import asyncio
from audiorag import AudioRAGPipeline, AudioRAGConfig

async def main():
    # Configure with your chosen providers
    config = AudioRAGConfig(
        stt_provider="openai",
        stt_model="whisper-1",
        embedding_provider="openai",
        embedding_model="text-embedding-3-small",
        vector_store_provider="chromadb",
        generation_provider="openai",
        generation_model="gpt-4o-mini",
        # API keys can also be set via environment variables
        openai_api_key="sk-...",
    )
    
    # Initialize pipeline
    pipeline = AudioRAGPipeline(config)
    
    # Index audio from YouTube
    await pipeline.index("https://youtube.com/watch?v=...")
    
    # Query the indexed content
    result = await pipeline.query("What are the main points discussed?")
    print(result.answer)
    
    # Access sources with timestamps
    for source in result.sources:
        print(f"{source.title} at {source.start_time}s")
        print(f"URL: {source.source_url}")

asyncio.run(main())
```

## Installation

```bash
# Install with uv (recommended)
uv add audiorag

# Or with pip
pip install audiorag
```

### Optional Dependencies

```bash
# Audio scraping utilities (yt-dlp, pydub)
uv add audiorag[defaults]  # or: pip install audiorag[defaults]

# All providers and utilities
uv add audiorag[all]  # or: pip install audiorag[all]

# Specific providers only
uv add audiorag[openai,chromadb,scraping,cohere]
```

## Command Line Interface

AudioRAG includes a premium CLI for easy setup, indexing, and querying.

### Setup

Configure your providers and API keys interactively:

```bash
audiorag setup
```

This will guide you through selecting providers for STT, embeddings, vector stores, and generation, saving them to a `.env` file.

### Indexing

Index audio from multiple sources in a single command:

```bash
# Single YouTube video
audiorag index "https://youtube.com/watch?v=..."

# YouTube playlist (auto-expanded to individual videos)
audiorag index "https://youtube.com/playlist?list=..."

# Local audio files and folders
audiorag index "./podcast.mp3" "./audio_folder/"

# Multiple URLs at once
audiorag index "https://youtube.com/watch?v=video1" "https://youtube.com/watch?v=video2"

# Mixed inputs
audiorag index "./local_audio/" "https://youtube.com/watch?v=..." "./interview.wav"
```

**Note:** Always wrap URLs and paths containing spaces in quotes.

**Options:**
- `--force`: Re-process and re-index even if the URL has been processed before.

The CLI automatically:
- Expands YouTube playlists/channels into individual video URLs
- Recursively discovers audio files in directories
- Shows progress tracking for batch operations
- Handles errors per source without stopping the entire batch

### Querying

Ask questions about your indexed audio content with a sophisticated results layout:

```bash
audiorag query "What are the main points discussed in the audio?"
```

## Configuration

AudioRAG uses pydantic-settings with environment variable support. All settings use the `AUDIORAG_` prefix.

```bash
# Example: Using OpenAI for STT, embeddings, and generation
export AUDIORAG_OPENAI_API_KEY="sk-..."
export AUDIORAG_STT_PROVIDER="openai"
export AUDIORAG_EMBEDDING_PROVIDER="openai"
export AUDIORAG_VECTOR_STORE_PROVIDER="chromadb"
export AUDIORAG_GENERATION_PROVIDER="openai"

# Example: Using different providers
export AUDIORAG_DEEPGRAM_API_KEY="..."
export AUDIORAG_STT_PROVIDER="deepgram"
export AUDIORAG_VOYAGE_API_KEY="..."
export AUDIORAG_EMBEDDING_PROVIDER="voyage"

# Processing settings
export AUDIORAG_CHUNK_DURATION_SECONDS="30"
export AUDIORAG_RETRIEVAL_TOP_K="10"
export AUDIORAG_RERANK_TOP_N="3"

# Optional budget governor
export AUDIORAG_BUDGET_ENABLED="true"
export AUDIORAG_BUDGET_RPM="60"
export AUDIORAG_BUDGET_TPM="120000"
export AUDIORAG_BUDGET_AUDIO_SECONDS_PER_HOUR="7200"

# Optional vector write verification
export AUDIORAG_VECTOR_STORE_VERIFY_MODE="best_effort"  # off | best_effort | strict
export AUDIORAG_VECTOR_STORE_VERIFY_MAX_ATTEMPTS="5"
export AUDIORAG_VECTOR_STORE_VERIFY_WAIT_SECONDS="0.5"
```

See [Configuration Guide](docs/configuration.md) for all options.

## Documentation

- [Quick Start Guide](docs/quickstart.md) - Get up and running
- [Configuration](docs/configuration.md) - All configuration options
- [Providers](docs/providers.md) - Available providers and setup
- [Architecture](docs/architecture.md) - Pipeline stages and data flow
- [API Reference](docs/api-reference.md) - Complete API documentation

## Development

```bash
# Clone and setup
git clone <repository-url>
cd audiorag
uv sync

# Run tests
uv run pytest

# Run checks
uv run ruff check . --fix
uv run ty check

# Install pre-commit hooks
uv run prek install
```

## Pipeline Stages

1. **Download**: Fetch audio from URL (YouTube supported)
2. **Split**: Divide large files into processable chunks
3. **Transcribe**: Convert audio to text using STT provider
4. **Chunk**: Group transcription into time-based segments
5. **Embed**: Generate vector embeddings for each chunk
6. **Store**: Persist embeddings in vector database

## Reliability Controls

- **Budget governor** (`AUDIORAG_BUDGET_ENABLED=true`): reserves budget before expensive calls and fails fast with `BudgetExceededError` when limits would be exceeded.
- **Preflight transcription reservation**: when audio duration is known, indexing reserves full audio-seconds budget before STT starts.
- **Persistent budget accounting**: budget usage is persisted in SQLite for cross-process and restart safety.
- **Vector write verification**: after `add()`, providers that support `verify(ids)` are checked.
- **Verification modes**: `off` disables checks, `best_effort` warns on failure, `strict` fails indexing when verification fails.

## License

MIT License
