Metadata-Version: 2.4
Name: rag_evaluate
Version: 0.2.0
Description-Content-Type: text/markdown
Requires-Dist: nltk
Requires-Dist: rouge-score
Requires-Dist: bert-score
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-dist

# RAG Evaluator

## Overview

RAG Evaluator is a Python library for evaluating Retrieval-Augmented Generation (RAG) systems. It provides various metrics to evaluate the quality of generated text against reference text.

## Installation
You can install the library using pip:

pip install rag-evaluate
Usage
Here's how to use the RAG Evaluator library:

from rag_evaluate import RAG_Evaluator

# Initialize the evaluator
evaluator = RAG_Evaluator()

# Input data
question = "What are the causes of climate change?"
generated_text = "Climate change is caused by human activities."
context = "Human activities such as burning fossil fuels cause climate change."

# Evaluate the response
bleu_score = evaluator.bleu_score(question, response, reference)
rouge_score = evaluator.rouge_score(question, response, reference)
bert_score = evaluator.bert_score(question, response, reference)

# Print the results
print(bleu_score)
print(rouge_score)
print(bert_score)

## The RAG Evaluator provides the following metrics:

BLEU (0-100): Measures the overlap between the generated output and reference text based on n-grams.

0-20: Low similarity, 20-40: Medium-low, 40-60: Medium, 60-80: High, 80-100: Very high

ROUGE-1 (0-1): Measures the overlap of unigrams between the generated output and reference text.

0.0-0.2: Poor overlap, 0.2-0.4: Fair, 0.4-0.6: Good, 0.6-0.8: Very good, 0.8-1.0: Excellent

BERT Score (0-1): Evaluates the semantic similarity using BERT embeddings (Precision, Recall, F1).

0.0-0.5: Low similarity, 0.5-0.7: Moderate, 0.7-0.8: Good, 0.8-0.9: High, 0.9-1.0: Very high

## Contributer
Biplab Sil
