Metadata-Version: 2.4
Name: ragscore
Version: 0.5.1
Summary: Generate QA datasets & evaluate RAG systems in 2 commands. Privacy-first, works with any LLM (local or cloud). Async, fast, zero config.
Author-email: RAGScore Team <team@ragscore.io>
License: Apache-2.0
Project-URL: Homepage, https://github.com/HZYAI/RagScore
Project-URL: Documentation, https://github.com/HZYAI/RagScore#readme
Project-URL: Repository, https://github.com/HZYAI/RagScore
Project-URL: Changelog, https://github.com/HZYAI/RagScore/blob/main/CHANGELOG.md
Project-URL: Bug Tracker, https://github.com/HZYAI/RagScore/issues
Keywords: rag,rag-evaluation,qa-generation,llm,llm-as-judge,local-llm,ollama,async,privacy,fine-tuning,synthetic-data,evaluation,golden-dataset
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: <3.13,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pypdf2>=3.0.1
Requires-Dist: nltk>=3.8.1
Requires-Dist: tqdm>=4.66.1
Requires-Dist: typer[all]>=0.16.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: aiohttp>=3.9.0
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
Provides-Extra: dashscope
Requires-Dist: dashscope>=1.14.1; extra == "dashscope"
Provides-Extra: providers
Requires-Dist: ragscore[anthropic,dashscope,openai]; extra == "providers"
Provides-Extra: all
Requires-Dist: ragscore[providers]; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.3; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: ruff>=0.1.6; extra == "dev"
Requires-Dist: black>=23.11.0; extra == "dev"
Requires-Dist: mypy>=1.7.0; extra == "dev"
Requires-Dist: types-requests>=2.31.0; extra == "dev"
Requires-Dist: pre-commit>=3.5.0; extra == "dev"
Requires-Dist: requests>=2.31.0; extra == "dev"
Dynamic: license-file

<div align="center">
  <img src="RAGScore.png" alt="RAGScore Logo" width="400"/>
  
  [![PyPI version](https://badge.fury.io/py/ragscore.svg)](https://pypi.org/project/ragscore/)
  [![PyPI Downloads](https://static.pepy.tech/personalized-badge/ragscore?period=total&units=international_system&left_color=black&right_color=green&left_text=downloads)](https://pepy.tech/projects/ragscore)
  [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
  [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
  [![Ollama](https://img.shields.io/badge/Ollama-Supported-orange)](https://ollama.ai)
  
  **Generate QA datasets & evaluate RAG systems in 2 commands**
  
  🔒 Privacy-First • ⚡ Async & Fast • 🤖 Any LLM • 🏠 Local or Cloud
  
  [English](README.md) | [中文](README_CN.md) | [日本語](README_JP.md)
</div>

---

## ⚡ 2-Line RAG Evaluation

```bash
# Step 1: Generate QA pairs from your docs
ragscore generate docs/

# Step 2: Evaluate your RAG system
ragscore evaluate http://localhost:8000/query
```

**That's it.** Get accuracy scores and incorrect QA pairs instantly.

```
============================================================
✅ EXCELLENT: 85/100 correct (85.0%)
Average Score: 4.20/5.0
============================================================

❌ 15 Incorrect Pairs:

  1. Q: "What is RAG?"
     Score: 2/5 - Factually incorrect

  2. Q: "How does retrieval work?"
     Score: 3/5 - Incomplete answer
```

---

## 🚀 Quick Start

### Install

```bash
pip install ragscore              # Core (works with Ollama)
pip install "ragscore[openai]"    # + OpenAI support
pip install "ragscore[all]"       # + All providers
```

### Generate QA Pairs

```bash
# Set API key (or use local Ollama - no key needed!)
export OPENAI_API_KEY="sk-..."

# Generate from any document
ragscore generate paper.pdf
ragscore generate docs/*.pdf --concurrency 10
```

### Evaluate Your RAG

```bash
# Point to your RAG endpoint
ragscore evaluate http://localhost:8000/query

# Custom options
ragscore evaluate http://api/ask --model gpt-4o --output results.json
```

---

## 🏠 100% Private with Local LLMs

```bash
# Use Ollama - no API keys, no cloud, 100% private
ollama pull llama3.1
ragscore generate confidential_docs/*.pdf
ragscore evaluate http://localhost:8000/query
```

**Perfect for:** Healthcare 🏥 • Legal ⚖️ • Finance 🏦 • Research 🔬

---

## 🔌 Supported LLMs

| Provider | Setup | Notes |
|----------|-------|-------|
| **Ollama** | `ollama serve` | Local, free, private |
| **OpenAI** | `export OPENAI_API_KEY="sk-..."` | Best quality |
| **Anthropic** | `export ANTHROPIC_API_KEY="..."` | Long context |
| **DashScope** | `export DASHSCOPE_API_KEY="..."` | Qwen models |
| **vLLM** | `export LLM_BASE_URL="..."` | Production-grade |
| **Any OpenAI-compatible** | `export LLM_BASE_URL="..."` | Groq, Together, etc. |

---

## 📊 Output Formats

### Generated QA Pairs (`output/generated_qas.jsonl`)

```json
{
  "id": "abc123",
  "question": "What is RAG?",
  "answer": "RAG (Retrieval-Augmented Generation) combines...",
  "rationale": "This is explicitly stated in the introduction...",
  "support_span": "RAG systems retrieve relevant documents...",
  "difficulty": "medium",
  "source_path": "docs/rag_intro.pdf"
}
```

### Evaluation Results (`--output results.json`)

```json
{
  "summary": {
    "total": 100,
    "correct": 85,
    "incorrect": 15,
    "accuracy": 0.85,
    "avg_score": 4.2
  },
  "incorrect_pairs": [
    {
      "question": "What is RAG?",
      "golden_answer": "RAG combines retrieval with generation...",
      "rag_answer": "RAG is a database system.",
      "score": 2,
      "reason": "Factually incorrect - RAG is not a database"
    }
  ]
}
```

---

## 🧪 Python API

```python
from ragscore import run_pipeline, run_evaluation

# Generate QA pairs
run_pipeline(paths=["docs/"], concurrency=10)

# Evaluate RAG
results = run_evaluation(
    endpoint="http://localhost:8000/query",
    model="gpt-4o",  # LLM for judging
)
print(f"Accuracy: {results.accuracy:.1%}")
```

---

## 🤖 AI Agent Integration

RAGScore is designed for AI agents and automation:

```bash
# Structured CLI with predictable output
ragscore generate docs/ --concurrency 5
ragscore evaluate http://api/query --output results.json

# Exit codes: 0 = success, 1 = error
# JSON output for programmatic parsing
```

**CLI Reference:**

| Command | Description |
|---------|-------------|
| `ragscore generate <paths>` | Generate QA pairs from documents |
| `ragscore evaluate <endpoint>` | Evaluate RAG against golden QAs |
| `ragscore --help` | Show all commands and options |
| `ragscore generate --help` | Show generate options |
| `ragscore evaluate --help` | Show evaluate options |

---

## ⚙️ Configuration

Zero config required. Optional environment variables:

```bash
export RAGSCORE_CHUNK_SIZE=512          # Chunk size for documents
export RAGSCORE_QUESTIONS_PER_CHUNK=5   # QAs per chunk
export RAGSCORE_WORK_DIR=/path/to/dir   # Working directory
```

---

## 🔐 Privacy & Security

| Data | Cloud LLM | Local LLM |
|------|-----------|-----------|
| Documents | ✅ Local | ✅ Local |
| Text chunks | ⚠️ Sent to LLM | ✅ Local |
| Generated QAs | ✅ Local | ✅ Local |
| Evaluation results | ✅ Local | ✅ Local |

**Compliance:** GDPR ✅ • HIPAA ✅ (with local LLMs) • SOC 2 ✅

---

## 🧪 Development

```bash
git clone https://github.com/HZYAI/RagScore.git
cd RagScore
pip install -e ".[dev,all]"
pytest
```

---

## 🔗 Links

- [GitHub](https://github.com/HZYAI/RagScore) • [PyPI](https://pypi.org/project/ragscore/) • [Issues](https://github.com/HZYAI/RagScore/issues) • [Discussions](https://github.com/HZYAI/RagScore/discussions)

---

<p align="center">
  <b>⭐ Star us on GitHub if RAGScore helps you!</b><br>
  Made with ❤️ for the RAG community
</p>
