Metadata-Version: 2.4
Name: mini-videorag
Version: 0.1.0
Summary: Multi-modal video analysis with AI-powered RAG
Author-email: Benjamin Dallard <dallard.benjamin@proton.me>
License-Expression: AGPL-3.0-or-later
Project-URL: Homepage, https://github.com/bdallard/mini-videorag
Project-URL: Documentation, https://github.com/bdallard/mini-videorag#readme
Project-URL: Repository, https://github.com/bdallard/mini-videorag
Project-URL: Issues, https://github.com/bdallard/mini-videorag/issues
Project-URL: Changelog, https://github.com/bdallard/mini-videorag/blob/main/CHANGELOG.md
Keywords: video,rag,ai,multimodal,llm,computer-vision
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Video
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python-dotenv<1.1.0,>=1.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: litellm>=1.0.0
Requires-Dist: openai==1.60.0
Requires-Dist: llama-index==0.12.14
Requires-Dist: llama-index-vector-stores-lancedb==0.3.0
Requires-Dist: llama-index-embeddings-clip==0.3.1
Requires-Dist: openai-clip==1.0.1
Requires-Dist: lancedb==0.18.0
Requires-Dist: opencv-python==4.11.0.86
Requires-Dist: moviepy==2.1.2
Requires-Dist: numpy<2.0.0,>=1.26.0
Requires-Dist: torch==2.2.2
Requires-Dist: torchvision==0.17.2
Requires-Dist: torchaudio==2.2.2
Requires-Dist: transformers==4.36.0
Requires-Dist: sentence-transformers==2.7.0
Requires-Dist: boto3==1.36.8
Requires-Dist: requests==2.32.3
Provides-Extra: ocr
Requires-Dist: easyocr==1.7.2; extra == "ocr"
Provides-Extra: yolo
Requires-Dist: ultralytics==8.3.162; extra == "yolo"
Requires-Dist: ultralytics-thop==2.0.14; extra == "yolo"
Provides-Extra: audio
Requires-Dist: shazamio_core==1.1.2; extra == "audio"
Provides-Extra: subtitle
Requires-Dist: nltk==3.9.1; extra == "subtitle"
Requires-Dist: wordninja==2.0.0; extra == "subtitle"
Requires-Dist: Levenshtein==0.27.1; extra == "subtitle"
Requires-Dist: langdetect==1.0.9; extra == "subtitle"
Provides-Extra: temporal
Requires-Dist: temporalio<2.0.0,>=1.18.1; extra == "temporal"
Provides-Extra: api
Requires-Dist: fastapi==0.115.7; extra == "api"
Requires-Dist: uvicorn==0.34.0; extra == "api"
Requires-Dist: python-multipart>=0.0.9; extra == "api"
Provides-Extra: dev
Requires-Dist: pytest>=9.0.2; extra == "dev"
Requires-Dist: pytest-cov<3.0.0,>=2.10.1; extra == "dev"
Requires-Dist: pylint==4.0.4; extra == "dev"
Requires-Dist: mypy==0.991; extra == "dev"
Requires-Dist: black==24.10.0; extra == "dev"
Requires-Dist: tox<5.0.0,>=4.0.0; extra == "dev"
Requires-Dist: tox-gh-actions<4.0.0,>=3.0.0; extra == "dev"
Requires-Dist: twine==5.1.1; extra == "dev"
Requires-Dist: build==0.10.0; extra == "dev"
Provides-Extra: processors
Requires-Dist: mini-videorag[audio,ocr,subtitle,yolo]; extra == "processors"
Provides-Extra: all
Requires-Dist: mini-videorag[api,dev,processors,temporal]; extra == "all"
Dynamic: license-file

# Mini VideoRAG

Multi-modal mini-video analysis framework to not spend to much tokens on your videos analysis tasks 🤗

## Installation

```bash
git clone https://github.com/bdallard/mini-videorag
cd mini-videorag

# Install everything (recommended - unified installation)
pip install -e ".[all]"
```

### Environment Setup

```bash
cp .env.example .env
```

Required variables:
- `OPENAI_API_KEY` - For transcription and VLM queries
- `STORAGE_TYPE=minio` - Object storage backend (default minio)
- `STORAGE_ENDPOINT_URL=http://localhost:9000`
- `LLM_MODEL=openai/gpt-4o-mini` - LLM provider for RAG


## Quickstart 

```bash
# Start the stack using docker 
make docker-up
```

Then you can query videos in natural language with 100+ LLM provider support via [LiteLLM](https://github.com/BerriAI/litellm).

```python
from mini_videorag.utils.config_loader import load_video_rag_config, create_pipeline_from_config
from mini_videorag.core.video_rag import VideoRAG
from mini_videorag.core import VideoRAGSession
from pydantic import BaseModel, Field

class DetailedAnswer(BaseModel):
    answer: str
    confidence_score: float = Field(ge=0.0, le=1.0)
    reasoning: str
    sources: list[str] = Field(default_factory=list)

config = load_video_rag_config()
pipeline = create_pipeline_from_config(config)

# Option 1: Context manager (automatic cleanup) for scripts
with VideoRAG(pipeline, output_schema=DetailedAnswer) as rag:
    rag.init("video.mp4", num_frames=10)
    answer = rag.ask("Does this have subtitles?")
    print(f"{answer.answer} (confidence: {answer.confidence_score})")

# Option 2: Session (manual control) for services/APIs
session = VideoRAGSession(pipeline, output_schema=DetailedAnswer)
session.initialize("video.mp4", num_frames=10)
answer = session.ask("Does this have subtitles?")
print(f"{answer.answer}\nReasoning: {answer.reasoning}")
session.cleanup()
```

---
## Features

- **Multi-modal processors**: Transcription, OCR, object/person detection, brand detection, NSFW detection, music recognition
- **DAG workflow engine**: Configure processor dependencies and execution order
- **Type-safe config**: Pydantic models with validation
- **Pluggable architecture**: Custom processors via factory pattern, entry points, or YAML config
- **100+ LLM providers**: OpenAI, Anthropic, Ollama via LiteLLM
- **Production-ready**: Thread-safe, resource cleanup, error handling

## Extending with Custom Processors

Three methods to add custom processors without forking:

### 1. Runtime Registration

```python
from mini_videorag.processors import ProcessorFactory

factory = ProcessorFactory()
factory.register_processor(
    processor_type="scene_detection",
    provider="opencv",
    class_path="my_package.processors:SceneDetector",
    set_as_default=True
)
```

### 2. Plugin System (Entry Points)

```toml
# pyproject.toml
[project.entry-points."mini_videorag.processors"]
scene_detection.opencv = "my_package.processors:SceneDetector"
```

### 3. YAML Configuration

```yaml
# config/video_rag_config.yml
custom_processors:
  - processor_type: scene_detection
    provider: opencv
    class_path: "my_package.processors:SceneDetector"
    default: true
```

See **[CONTRIBUTING.md](CONTRIBUTING.md)** for detailed plugin development guide.


## Workflow Configuration

Edit `config/video_rag_config.yml` to control processor execution dependencies.

| Processor | Purpose | Dependencies |
|-----------|---------|--------------|
| `transcription` | Speech-to-text (Whisper) | OPENAI_API_KEY |
| `frame_extraction` | Extract frames (OpenCV) | - |
| `music_detection` | Music recognition (Shazam) | - |
| `ocr` | Text extraction (EasyOCR) | `frame_extraction`* |
| `person_detection` | Detect people (YOLO) | `frame_extraction`* |
| `object_detection` | Detect objects (YOLO) | `frame_extraction`* |
| `brand_detection` | Detect brands (HF) | `frame_extraction`* |
| `content_safety` | NSFW detection (HF) | `frame_extraction`* |
| `subtitle_check` | Subtitle presence (N-gram) | `ocr`, `transcription` |

\* Optional dependency - processors can run independently but benefit from shared frame extraction


## Temporal Workflows

Distributed video processing with fault tolerance, retries, and progress tracking via [Temporal](https://temporal.io/).

### Prerequisites

- Temporal server running (via Docker or cloud)
- MinIO/S3 for video storage (optional, supports local paths too)

```bash
# Start Temporal + MinIO
docker-compose -f docker-compose.prod.yml up
```

or you can start the worker manually with your settings like below : 


```bash
python -m mini_videorag.temporal.run_worker

# With custom settings
TEMPORAL_URL=localhost:7233 \
TASK_QUEUE=video-processing \
MAX_CONCURRENT_ACTIVITIES=10 \
python -m mini_videorag.temporal.run_worker
```

then start the API for triggering workflows and querying results

```bash
python -m mini_videorag --host 0.0.0.0 --port 8000
```

Swagger UI available at `http://localhost:8000/docs`.

### API Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/storage/upload` | POST | Upload video to MinIO/S3 |
| `/storage/objects` | GET | List stored objects |
| `/storage/url/{key}` | GET | Get presigned download URL |
| `/workflows/analyze` | POST | Start video analysis workflow |
| `/workflows/{id}/progress` | GET | Query workflow progress |
| `/workflows/{id}/result` | GET | Get final result |


### Configuration

Temporal settings in `config/video_rag_config.yml`:

```yaml
temporal:
  activity:
    start_to_close_timeout_minutes: 30
    heartbeat_timeout_minutes: 10
    processor_overrides:
      ocr:
        heartbeat_timeout_minutes: 15
    retry:
      maximum_attempts: 3
      non_retryable_errors: [ModelInitError, VideoLoadError]
  workflow:
    execution_timeout_minutes: 120
```

The same `workflow:` DAG configuration drives both local `ProcessorPipeline` and Temporal execution.

### Monitoring

Enable Prometheus metrics on the worker:

```bash
PROMETHEUS_ENABLED=true PROMETHEUS_PORT=9091 python -m mini_videorag.temporal.run_worker
```

Access metrics at `http://localhost:9091/metrics` or Prometheus UI at `http://localhost:9090` (when using docker-compose).


---

## Testing

```bash
pip install -e ".[dev]"

# Fast mode (skip model downloads and heavy tasks) - recommended for dev & ci 
make test-ci
pytest -m "not requires_download"

# full test suite
pytest
```

**Markers**: `unit`, `integration`, `slow`, `requires_download`


## Documentation

- **[CLAUDE.md](CLAUDE.md)** - AI/Developer guide and architecture
- **[CONTRIBUTING.md](CONTRIBUTING.md)** -  development guide
- **[demo_video_rag.ipynb](demo_video_rag.ipynb)** - notebook example
- **[tests/TESTING_GUIDE.md](tests/TESTING_GUIDE.md)** - testing guide
