Metadata-Version: 2.4
Name: faerie-eval
Version: 0.1.0
Summary: Fact Error Rate (FaER) evaluation framework for deep research reports
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: dragen>=0.1.0
Requires-Dist: pydantic>=2.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: requests>=2.28
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

<div align="center">

# Faerie

### Fact Error Rate evaluation for AI research reports.

[![PyPI version](https://img.shields.io/pypi/v/faerie-eval.svg)](https://pypi.org/project/faerie-eval/)
[![License](https://img.shields.io/github/license/chonkie-inc/faerie.svg)](https://github.com/chonkie-inc/faerie/blob/main/LICENSE)
[![GitHub stars](https://img.shields.io/github/stars/chonkie-inc/faerie.svg)](https://github.com/chonkie-inc/faerie/stargazers)

</div>

---

AI systems can now generate research reports with citations, but how do you know the citations actually support the claims? That a source URL is real? That the facts are correct?

Faerie (**Fa**ct **E**rror **R**ate) is an evaluation framework that answers these questions. Give it a markdown report and it will parse every claim, fetch every cited source, verify citations against source content, independently fact-check claims via web search, and produce a detailed grade card with metrics. The pipeline runs in seven steps — parse, verify sources, extract claims, verify claims, calculate metrics, assess quality, extract insights — and outputs a structured `EvaluationReport` with scores, verdicts, and recommendations.

- **Source verification** — fetches every cited URL to check accessibility and content
- **Citation accuracy** — checks whether each claim is actually supported by its cited source
- **Independent fact-checking** — verifies claims against the web, not just the cited source
- **Exponential penalty scoring** — false facts halve the score (0.5^n), unverifiable claims cut 25% each (0.75^n)
- **Quality assessment** — LLM judge evaluates report structure and thoroughness
- **Insight extraction** — identifies Nth-order insights that synthesize across multiple facts
- **Grade card** — S+ through F- with ASCII art, per-category assessment, and actionable recommendations

## Installation

```bash
pip install faerie-eval
```

## Quick Start

### CLI

```bash
faerie evaluate report.md
```

```bash
faerie evaluate report.md --output results.json --verbose
```

### Python

```python
from faerie import FaerieEvaluator

evaluator = FaerieEvaluator(model="gemini-3-flash-preview", verbose=True)
result = evaluator.evaluate_file("report.md")

print(f"Overall Score: {result.overall_quality_score:.0%}")
print(f"Citation Accuracy: {result.faer_metrics.citation_accuracy_rate:.0%}")
print(f"Factual Accuracy: {result.faer_metrics.fact_accuracy_rate:.0%}")
```

### Generate + Evaluate

Faerie pairs with [Dragen](https://github.com/chonkie-inc/dragen) for end-to-end research pipelines. Generate a report, then evaluate it:

```bash
# Generate a research report
python examples/deep_research.py "AI agents in 2025"

# Evaluate the generated report
faerie evaluate report.md --verbose
```

## CLI Options

```
faerie evaluate <report.md> [OPTIONS]

  --output, -o PATH     Save full JSON results to file
  --model, -m MODEL     LLM model for verification (default: gemini-3-flash-preview)
  --skip-fact-check     Skip independent fact verification (faster, citation-only)
  --skip-insights       Skip Nth-order insight extraction
  --max-workers, -w N   Max parallel verification workers (default: 3)
  --verbose, -v         Print detailed progress
  --json                Output full JSON to stdout
```

## Scoring

The overall score combines four dimensions with exponential penalties for errors:

| Dimension | Weight | Description |
|-----------|--------|-------------|
| Citation Accuracy | 30% | Do sources support the claims they're cited for? |
| Factual Accuracy | 60% | Are claims independently verifiable as true? |
| Structure | 5% | Is the report well-organized? |
| Thoroughness | 5% | Does it cover the topic in depth? |

**Penalty system:** Each false fact multiplies the score by 0.5. Each unverifiable claim multiplies by 0.75. A report with 3 false facts and 2 unverifiable claims gets: `base_score * 0.5^3 * 0.75^2 = base_score * 0.07`.

## Citation

If you use Faerie in your research, please cite it as:

```bibtex
@software{faerie,
  title = {Faerie: Fact Error Rate Evaluation Framework},
  author = {Chonkie Inc.},
  url = {https://github.com/chonkie-inc/faerie},
  license = {MIT},
  year = {2025-2026}
}
```
