Metadata-Version: 2.4
Name: agentic_qa
Version: 0.2.0
Summary: Autonomous Agentic QA System for testing RAG pipelines and LLM systems.
Home-page: https://github.com/yourusername/multi-agent-qa
Author: Your Name
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: langgraph>=0.2.0
Requires-Dist: langchain>=0.3.0
Requires-Dist: langchain-openai>=0.2.0
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: langsmith>=0.1.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: requests>=2.30.0
Requires-Dist: ragas>=0.1.0
Requires-Dist: datasets>=2.0.0
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# 🛡️ Agentic QA: Autonomous Multi-Agent Testing for RAG & LLMs

**Agentic QA** is a Python library that autonomously generates adversarial test cases, executes them against your RAG/LLM system, evaluates the results, and self-improves its testing coverage—all without human intervention.

Unlike traditional testing frameworks (like RAGAS or TruLens) that evaluate outputs against static, human-written inputs, **Agentic QA acts as an active red-team**, dynamically generating the tricky edge cases needed to break your system.

![Python](https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white)
![LangGraph](https://img.shields.io/badge/LangGraph-0.2+-1C3C3C?logo=langchain&logoColor=white)
![LangSmith](https://img.shields.io/badge/LangSmith-Monitored-FF6B35?logo=langchain&logoColor=white)

---

## 🚀 Installation

Install the library directly via pip:

```bash
pip install agentic-qa
```

*Note: You will need an OpenAI API key for the agents to operate.*

```bash
export OPENAI_API_KEY="sk-..."
```

---

## 📖 How to Use

Agentic QA provides two ways to test your systems: a **High-Level Simple API** for quick tests, and a **Low-Level Advanced API** for custom setups (like Jupyter Notebooks).

### Option 1: High-Level API (Recommended)
You can test any RAG or LLM pipeline in just a few lines of code using the `run_autonomous_test` wrapper.

#### Testing a Python Function
```python
import agentic_qa

# 1. Your existing RAG or Chatbot function
def my_custom_rag(query: str) -> str:
    # Example: return my_langchain_pipeline.invoke(query)
    return "This is my AI response."

# 2. Run the autonomous testing loop
final_state = agentic_qa.run_autonomous_test(
    target_function=my_custom_rag,
    system_name="YouTube Video Q&A",
    system_description="A chatbot that answers questions about YouTube transcripts.",
    domain="video content",
    max_iterations=3,          # How many times agents learn and retry
    tests_per_iteration=5      # Tests generated per round
)

# 3. View the generated report
print(final_state["final_report"])
```

#### Testing an API Endpoint
If your system is deployed behind a REST API (FastAPI, Flask, LangServe):

```python
import agentic_qa

final_state = agentic_qa.run_autonomous_test(
    api_endpoint="http://localhost:8000/api/chat",
    system_name="Customer Support Bot",
    system_description="An AI that resolves customer support tickets.",
    domain="customer support"
)
```

---

### Option 2: Low-Level Advanced API
If you need fine-grained control over the SUT (System Under Test) adapters or want to integrate the workflow directly into a Jupyter Notebook or a custom LangGraph pipeline, use the adapter classes directly.

```python
import os
from agentic_qa.sut import CallableAdapter, set_active_sut
from agentic_qa.graph.workflow import run_qa_pipeline

# 1. Configure testing environment variables
os.environ["MAX_ITERATIONS"] = "3"
os.environ["TESTS_PER_ITERATION"] = "3"

# 2. Define your RAG function
def ask_research_paper(query: str) -> str:
    return "Attention is a mechanism..."

# 3. Wrap your RAG function in the CallableAdapter
adapter = CallableAdapter(
    fn=ask_research_paper,
    system_name="Research Paper RAG",
    description="A Retrieval-Augmented Generation system that answers questions about machine learning research papers.",
    domain="Academic Research / Machine Learning"
)

# 4. Set it as the active System Under Test
set_active_sut(adapter)

# 5. Launch the autonomous multi-agent pipeline
print("Launching Autonomous Multi-Agent QA...\n")
final_state = run_qa_pipeline()

# 6. Extract metrics and the final Markdown report
print(f"Coverage Score: {final_state.get('coverage_score', 0):.0%}")
print(final_state.get("final_report", "No report generated."))
```

---

## 🏗️ Architecture

The framework is powered by 5 autonomous agents built with **LangGraph**:

```
START ──▶ 🔴 Red-Team Agent ──▶ ⚡ Executor Agent ──▶ ⚖️ Judge Agent ──▶ Decision
              ▲                                                            │
              │                                                      ┌─────┴─────┐
              │                                                      ▼           ▼
              └──────────────── 🔧 Refiner Agent               📊 Reporter Agent
                                   (loop back)                       (END)
```

### Agent Roles

| Agent | Role |
|-------|------|
| 🔴 **Red-Team** | Generates adversarial test inputs targeting edge cases (prompt injections, boundary values, etc.). |
| ⚡ **Executor** | Runs tests through the target system and captures the outputs. |
| ⚖️ **Judge** | Evaluates the outputs using an LLM-as-a-Judge pattern with strict pass/fail criteria. |
| 🔧 **Refiner** | Analyzes the judge's failure patterns and instructs the Red-Team on how to exploit weaknesses in the next iteration. |
| 📊 **Reporter** | Compiles a comprehensive final Markdown QA report. |

---

## 📊 Streamlit Dashboard

If you prefer a visual dashboard to monitor the agents in real-time, run the included Streamlit app:

```bash
streamlit run app.py
```

From the UI, you can connect your API endpoint or use the built-in mock system for a demonstration, watching the agents generate verdicts and failure patterns live!

## 📡 LangSmith Monitoring

All agent interactions are automatically traced via LangSmith if configured in `.env`.

```env
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your-langsmith-api-key
LANGCHAIN_PROJECT=agentic-qa
```

## 📄 License
MIT
