Metadata-Version: 2.4
Name: videorag
Version: 0.1.0
Summary: Retrieve relevant transcript chunks from YouTube videos
Author-email: Atharva Deshmukh <atharvad38@gmail.com>
License: MIT
Keywords: youtube,transcript,rag,vector-search,faiss,nlp
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langchain-community
Requires-Dist: langchain-text-splitters
Requires-Dist: youtube-transcript-api
Requires-Dist: faiss-cpu
Requires-Dist: sentence-transformers
Dynamic: license-file



# videorag

A simple Python library to extract and retrieve the most relevant transcript chunks from a **YouTube video** for a user query using vector search and semantic retrieval. It helps developers quickly index video transcripts and answer questions about the video content.

---

## 🧠 What It Does

`videorag` fetches the transcript of a YouTube video and uses a **vector store with MMR retrieval** to find the most relevant text parts for a user query.
This makes it easy to build tools like video Q&A bots, summarizers, and RAG systems.

---

## 🚀 Features

* Automatically download YouTube transcripts (English first, fallback languages)
* Clean and split transcripts into searchable chunks
* Build a FAISS vector index for fast retrieval
* Retrieve top-k relevant chunks for a query
* Minimal and easy-to-use API

---

## 🧾 Installation

Install directly via `pip` in your Python environment:

```bash
pip install videorag
```

Or install from the local repository (after cloning):

```bash
pip install -e .
```

> Make sure you are using a virtual environment (`venv`) when installing dependencies. ([pyOpenSci][1])

---

## 📦 Usage Example

Here is a simple usage example in Python:

```python
from videorag import get_relevant_chunks_from_video

video_link = "https://www.youtube.com/watch?v=HbZD0XoN5fc"
query = "Who is Sunita Williams?"
k = 3

relevant_text = get_relevant_chunks_from_video(video_link, query, k)
print(relevant_text)
```

This returns the **top-k relevant transcript text chunks** that best match your query.

---

## 🛠️ How It Works

1. Fetch the **YouTube video transcript**
2. Clean and split it into chunks
3. Create or load a **FAISS vector index**
4. Run **MMR (Maximal Marginal Relevance)** to get diverse, relevant chunks

---

## 🧪 Contributing

Contributions are welcome! You can:

* Report bugs
* Suggest new features
* Improve documentation
* Add tests

Feel free to open issues or submit pull requests.



## Legal Notice

This software is provided under the MIT License. Users are responsible for 
complying with YouTube's Terms of Service when using this library. The author 
assumes no liability for how this library is used.
