Metadata-Version: 2.4
Name: a2rag
Version: 0.1.3
Summary: Abstention-Aware RAG Decision Layer
Author: AIBee Research
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: full
Requires-Dist: requests; extra == "full"
Requires-Dist: rich; extra == "full"
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: provides-extra
Dynamic: requires-python
Dynamic: summary

<div dir="ltr" align="left" style="direction:ltr;text-align:left;">

# A2RAG - Abstention-Aware RAG Decision Layer

[![PyPI version](https://badge.fury.io/py/a2rag.svg)](https://pypi.org/project/a2rag/)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

> **Decides when your RAG system should answer, ask for clarification, or abstain.**

Standard RAG systems answer every question - even when they shouldn't.
A2RAG adds a decision layer that prevents unsafe or hallucinated answers.

---

## The Problem

```
User: "Can I return this item?"
RAG:  "Yes, returns are accepted within 14 days."  - confident but WRONG
                                                      (user has a digital item - not returnable)
```

## The Solution

```
User: "Can I return this item?"
A2RAG: CLARIFY - "Was this a physical product or a digital item?"
```

---

## Installation

```bash
pip install a2rag
```

Zero required dependencies. Works with any RAG system and any LLM.

---

## Quick Start

```python
from a2rag import A2RAGClient

client = A2RAGClient(api_key="your_key_here")

# Step 1: Your existing RAG pipeline (unchanged)
contexts     = your_rag.retrieve(user_query)
draft_answer = your_llm.generate(user_query, contexts)

# Step 2: A2RAG decides what to do
decision = client.decide(
    query=user_query,
    contexts=contexts,
    draft_answer=draft_answer,
)

# Step 3: Act on the decision
if decision.should_answer:
    show_to_user(draft_answer)          # safe to show
elif decision.should_clarify:
    ask_user(decision.clarification)    # ask specific follow-up
elif decision.should_abstain:
    escalate_to_human()                 # route to human agent
```

---

## How It Works

A2RAG uses two independent scores to make every decision:

| Score | Question |
|-------|----------|
| **Evidence Score** | Does the knowledge base actually support this answer? |
| **Completeness Score** | Did the user provide enough context for a specific answer? |

This separation prevents a common failure mode: high retrieval confidence
on a question the corpus doesn't actually cover.

---

## Decision Object

```python
decision.action             # "answer" | "clarify" | "abstain"
decision.confidence         # 0.0 - 1.0
decision.clarification      # Specific question to ask (if action="clarify")
decision.missing_fields     # What information is missing
decision.should_answer      # bool shortcut
decision.should_clarify     # bool shortcut
decision.should_abstain     # bool shortcut
decision.evidence_score     # How well corpus supports the answer
decision.query_type         # "generic_policy" | "instance_specific"
decision.is_high_confidence # True when confidence >= 0.80
```

---

## Benchmark Results

Tested on a controlled benchmark of 40 scenarios across 6 domains
and 5 languages (EN, HE, AR, FR, ES):

| System | UAR (↓ lower is better) | Safe Answers | Abstain Precision |
|--------|------------------------|--------------|-------------------|
| Standard RAG | 80% | 20% | 0% |
| RAG + confidence threshold | 80% | 20% | 0% |
| **A2RAG** | **0%** | **91%** | **100%** |

**UAR (Unsafe Answer Rate):** percentage of answers that were factually wrong or unsupported.
A2RAG achieves 0% UAR - it never confidently answers a question it cannot support.

---

## Metrics & Analytics

All metrics are computed from **local storage** - your data never leaves your machine.

```python
m = client.metrics(days=30)

print(f"Answer rate:    {m.answer_rate:.1%}")   # % of queries answered
print(f"UAR:            {m.uar:.1%}")            # Unsafe Answer Rate (0% = perfect)
print(f"ORS:            {m.ors:.1%}")            # Overall Reliability Score
print(f"Avg latency:    {m.avg_latency_ms:.0f}ms")
print(f"Avg confidence: {m.avg_confidence:.1%}")

# Break down by domain or language
by_domain   = client.metrics_by_domain()
by_language = client.metrics_by_language()

# Trend over time
trends = client.trends(days=30, interval="day")
```

### Key Metrics Explained

| Metric | What It Measures | Good Value |
|--------|-----------------|------------|
| **UAR** | Unsafe Answer Rate — wrong answers shown to users | < 5% |
| **ORS** | Overall Reliability Score — combined quality metric | > 70% |
| **AbstainPrecision** | When we refuse to answer, are we right? | > 90% |
| **Coverage** | % of queries that receive an answer | > 50% |

---

## Local Dashboard

```python
client.dashboard()   # opens browser at http://localhost:7860
```

Or from terminal:
```bash
a2rag dashboard
```

Shows answer/clarify/abstain rates, trends, latency, and confidence
distribution — all from local data, nothing sent externally.

---

## Calibration

Find optimal thresholds for your specific domain and corpus:

```python
labeled_data = [
    {
        "query":        "What is the refund window?",
        "contexts":     ["Refunds available within 14 days for unused items."],
        "draft_answer": "14 days.",
        "label":        "answer",   # answer | clarify | abstain
    },
    # ... 50+ examples recommended
]

result = client.calibrate(labeled_data, domain="insurance")
print(f"tau_evidence:     {result.tau_evidence}")
print(f"Expected accuracy: {result.expected_accuracy:.1%}")
print(f"Expected UAR:      {result.expected_uar:.1%}")
```

---

## Supported Context Formats

Works with any RAG output format - no changes to your pipeline:

```python
# Plain strings
contexts = ["Policy text here..."]

# Dicts
contexts = [{"text": "...", "score": 0.9, "source": "doc1.pdf"}]

# LangChain Documents
from langchain.schema import Document
contexts = [Document(page_content="...", metadata={"source": "doc1"})]

# LlamaIndex Nodes - works automatically
contexts = [node]

# Custom objects with .text attribute
contexts = [my_chunk]
```

---

## Supported Languages

Language is auto-detected. No configuration needed.

English, Hebrew, Arabic, French, Spanish, and more.

---

## Domain Profiles

Pre-configured thresholds per domain:

```python
decision = client.decide(query, contexts, draft, domain="insurance")
# Options: insurance | legal | medical | support | hr | generic
```

| Domain | Risk Tolerance | Typical Use Case |
|--------|---------------|------------------|
| `insurance` | Conservative | Claims, policy questions |
| `legal` | Very conservative | Contracts, compliance |
| `medical` | Very conservative | Clinical information |
| `support` | Moderate | Customer service |
| `hr` | Moderate | Employee policies |
| `generic` | Balanced | General purpose |

---

## Privacy

| Data | Stored | Sent to A2RAG? |
|------|--------|----------------|
| Query content | Never | Never |
| Retrieved contexts | Never | Never |
| Draft answers | Never | Never |
| Decision metadata | `~/.a2rag/decisions.db` (local only) | Free tier: anonymous only |
| Feedback & comments | Local only | Never |

```python
# Disable all telemetry (paid plans)
client = A2RAGClient(api_key="...", telemetry=False)
```

---

## Getting Started

1. Contact **stav@aibee.co.il** to request an API key
2. `pip install a2rag`
3. Free tier: 1,000 requests/month - no credit card required

**Status: Private Beta**

---

## License

MIT License

Copyright (c) 2026 Stav Vaknin - [aibee.co.il](https://aibee.co.il)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, subject to the standard MIT terms.


</div>
