Metadata-Version: 2.4
Name: agentic_qa
Version: 0.1.0
Summary: Autonomous Agentic QA System for testing RAG pipelines and LLM systems.
Home-page: https://github.com/yourusername/multi-agent-qa
Author: Your Name
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: langgraph>=0.2.0
Requires-Dist: langchain>=0.3.0
Requires-Dist: langchain-openai>=0.2.0
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: langsmith>=0.1.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: requests>=2.30.0
Requires-Dist: ragas>=0.1.0
Requires-Dist: datasets>=2.0.0
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# 🛡️ Agentic QA: Autonomous Multi-Agent Testing for RAG & LLMs

**Agentic QA** is a Python library that autonomously generates adversarial test cases, executes them against your RAG/LLM system, evaluates the results, and self-improves its testing coverage—all without human intervention.

Unlike traditional testing frameworks (like RAGAS or TruLens) that evaluate outputs against static, human-written inputs, **Agentic QA acts as an active red-team**, dynamically generating the tricky edge cases needed to break your system.

![Python](https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white)
![LangGraph](https://img.shields.io/badge/LangGraph-0.2+-1C3C3C?logo=langchain&logoColor=white)
![LangSmith](https://img.shields.io/badge/LangSmith-Monitored-FF6B35?logo=langchain&logoColor=white)
![Streamlit](https://img.shields.io/badge/Streamlit-Dashboard-FF4B4B?logo=streamlit&logoColor=white)

---

## 🚀 Quick Start

### 1. Installation

Install the library locally:

```bash
git clone https://github.com/yourusername/multi-agent-qa.git
cd multi-agent-qa
pip install -e .
```

Ensure you have your `.env` configured with your API keys:

```bash
cp .env.example .env
# Edit .env and provide OPENAI_API_KEY
```

### 2. Using the Python Library

You can test any RAG or LLM pipeline in just a few lines of code.

#### Option A: Testing a Python Function
If your RAG system is a Python function in your codebase:

```python
import agentic_qa

# Your existing RAG or Chatbot function
def my_custom_rag(query: str) -> str:
    # Example: return my_langchain_pipeline.invoke(query)
    return "This is my AI response."

# Run the autonomous testing loop
report = agentic_qa.run_autonomous_test(
    target_function=my_custom_rag,
    system_name="YouTube Video Q&A",
    system_description="A chatbot that answers questions about YouTube transcripts.",
    domain="video content",
    max_iterations=3,          # How many times agents learn and retry
    tests_per_iteration=5      # Tests generated per round
)
```

#### Option B: Testing an API Endpoint
If your system is deployed behind a REST API (FastAPI, Flask, LangServe):

```python
import agentic_qa

report = agentic_qa.run_autonomous_test(
    api_endpoint="http://localhost:8000/api/chat",
    system_name="Customer Support Bot",
    system_description="An AI that resolves customer support tickets.",
    domain="customer support"
)
```

### 3. Using the Streamlit UI

If you prefer a visual dashboard to monitor the agents in real-time, run the included Streamlit app:

```bash
streamlit run app.py
```

From the UI, you can connect your API endpoint or use the built-in mock system for a demonstration.

---

## 🏗️ Architecture

The framework is powered by 5 autonomous agents built with LangGraph:

```
START ──▶ 🔴 Red-Team Agent ──▶ ⚡ Executor Agent ──▶ ⚖️ Judge Agent ──▶ Decision
              ▲                                                            │
              │                                                      ┌─────┴─────┐
              │                                                      ▼           ▼
              └──────────────── 🔧 Refiner Agent               📊 Reporter Agent
                                   (loop back)                       (END)
```

### Agent Roles

| Agent | Role |
|-------|------|
| 🔴 **Red-Team** | Generates adversarial test inputs targeting edge cases (prompt injections, boundary values, etc.). |
| ⚡ **Executor** | Runs tests through the target system and captures the outputs. |
| ⚖️ **Judge** | Evaluates the outputs using an LLM-as-a-Judge pattern with strict pass/fail criteria. |
| 🔧 **Refiner** | Analyzes the judge's failure patterns and instructs the Red-Team on how to exploit weaknesses in the next iteration. |
| 📊 **Reporter** | Compiles a comprehensive final Markdown QA report. |

---

## 🧠 What Makes This Novel

| Traditional Testing Tools (RAGAS, TruLens) | Agentic QA |
|----------------------------------------|-------------|
| Measures outputs against static inputs | **Generates** the adversarial inputs autonomously |
| Human writes test cases | AI agents write and refine test cases |
| One-shot evaluation | **Self-improving loop** with pattern learning |
| Relies heavily on reference data | Relies on behavioral boundaries and edge-case testing |

---

## 📂 Project Structure

```text
multi-agent-qa/
├── agentic_qa/
│   ├── __init__.py           # Clean developer API (run_autonomous_test)
│   ├── agents/               # 5 LangGraph agent definitions
│   ├── graph/                # State definitions and LangGraph flow
│   ├── schemas/              # Pydantic validation models
│   ├── sut/                  # Adapters (API, Callable, Base)
│   └── utils/                # Prompt templates
├── setup.py                  # Package configuration
├── app.py                    # Streamlit Dashboard UI
├── .env                      # API Keys configuration
└── README.md
```

## 📡 LangSmith Monitoring

All agent interactions are automatically traced via LangSmith if configured in `.env`.

```env
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your-langsmith-api-key
LANGCHAIN_PROJECT=agentic-qa
```

## 📄 License

MIT
