Metadata-Version: 2.4
Name: arize-phoenix-evals
Version: 2.11.0
Summary: LLM Evaluations
Project-URL: Documentation, https://arize.com/docs/phoenix/
Project-URL: Issues, https://github.com/Arize-ai/phoenix/issues
Project-URL: Source, https://github.com/Arize-ai/phoenix
Author-email: Arize AI <phoenix-devs@arize.com>
License: Elastic-2.0
License-File: IP_NOTICE
License-File: LICENSE
Keywords: Explainability,Monitoring,Observability
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: <3.14,>=3.8
Requires-Dist: jsonpath-ng
Requires-Dist: openinference-instrumentation>=0.1.20
Requires-Dist: openinference-semantic-conventions>=0.1.19
Requires-Dist: opentelemetry-api
Requires-Dist: pandas
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pystache
Requires-Dist: tqdm
Requires-Dist: typing-extensions<5,>=4.5
Provides-Extra: dev
Requires-Dist: anthropic>0.18.0; extra == 'dev'
Requires-Dist: boto3; extra == 'dev'
Requires-Dist: litellm>=1.28.9; extra == 'dev'
Requires-Dist: mistralai>=1.0.0; extra == 'dev'
Requires-Dist: openai>=1.0.0; extra == 'dev'
Requires-Dist: vertexai; extra == 'dev'
Provides-Extra: test
Requires-Dist: anthropic>=0.18.0; extra == 'test'
Requires-Dist: boto3; extra == 'test'
Requires-Dist: lameenc; extra == 'test'
Requires-Dist: litellm>=1.28.9; extra == 'test'
Requires-Dist: mistralai>=1.0.0; extra == 'test'
Requires-Dist: nest-asyncio; extra == 'test'
Requires-Dist: openai>=1.0.0; extra == 'test'
Requires-Dist: openinference-semantic-conventions; extra == 'test'
Requires-Dist: pandas; extra == 'test'
Requires-Dist: pandas-stubs==2.0.3.230814; extra == 'test'
Requires-Dist: respx; extra == 'test'
Requires-Dist: tqdm; extra == 'test'
Requires-Dist: types-tqdm; extra == 'test'
Requires-Dist: typing-extensions<5,>=4.5; extra == 'test'
Requires-Dist: vertexai; extra == 'test'
Description-Content-Type: text/markdown

# arize-phoenix-evals

<p align="center">
    <a href="https://pypi.org/project/arize-phoenix-evals/">
        <img src="https://img.shields.io/pypi/v/arize-phoenix-evals" alt="PyPI Version">
    </a>
    <a href="https://arize-phoenix.readthedocs.io/projects/evals/en/latest/index.html">
        <img src="https://img.shields.io/badge/docs-blue?logo=readthedocs&logoColor=white" alt="Documentation">
    </a>
    <img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=8e8e8b34-7900-43fa-a38f-1f070bd48c64&page=packages/phoenix-evals/README.md" />
</p>

Phoenix Evals provides **lightweight, composable building blocks** for writing and running evaluations on LLM applications, including tools to determine relevance, toxicity, hallucination detection, and much more.

## Features

- **Works with your preferred model SDKs** via adapters (OpenAI, LiteLLM, LangChain)
- **Powerful input mapping and binding** for working with complex data structures
- **Several pre-built metrics** for common evaluation tasks like hallucination detection
- **Evaluators are natively instrumented** via OpenTelemetry tracing for observability and dataset curation
- **Blazing fast performance** - achieve up to 20x speedup with built-in concurrency and batching
- **Tons of convenience features** to improve the developer experience!

## Installation

Install Phoenix Evals 2.0 using pip:

```shell
pip install 'arize-phoenix-evals>=2.0.0' openai
```

## Quick Start

```python
from phoenix.evals import create_classifier
from phoenix.evals.llm import LLM

# Create an LLM instance
llm = LLM(provider="openai", model="gpt-4o")

# Create an evaluator
evaluator = create_classifier(
    name="helpfulness",
    prompt_template="Rate the response to the user query as helpful or not:\n\nQuery: {input}\nResponse: {output}",
    llm=llm,
    choices={"helpful": 1.0, "not_helpful": 0.0},
)

# Simple evaluation
scores = evaluator.evaluate({"input": "How do I reset?", "output": "Go to settings > reset."})
scores[0].pretty_print()

# With input mapping for nested data
scores = evaluator.evaluate(
    {"data": {"query": "How do I reset?", "response": "Go to settings > reset."}},
    input_mapping={"input": "data.query", "output": "data.response"}
)
scores[0].pretty_print()
```

## Evaluating Dataframes

```python
import pandas as pd
from phoenix.evals import create_classifier, evaluate_dataframe
from phoenix.evals.llm import LLM

# Create an LLM instance
llm = LLM(provider="openai", model="gpt-4o")

# Create multiple evaluators
relevance_evaluator = create_classifier(
    name="relevance",
    prompt_template="Is the response relevant to the query?\n\nQuery: {input}\nResponse: {output}",
    llm=llm,
    choices={"relevant": 1.0, "irrelevant": 0.0},
)

helpfulness_evaluator = create_classifier(
    name="helpfulness",
    prompt_template="Is the response helpful?\n\nQuery: {input}\nResponse: {output}",
    llm=llm,
    choices={"helpful": 1.0, "not_helpful": 0.0},
)

# Prepare your dataframe
df = pd.DataFrame([
    {"input": "How do I reset my password?", "output": "Go to settings > account > reset password."},
    {"input": "What's the weather like?", "output": "I can help you with password resets."},
])

# Evaluate the dataframe
results_df = evaluate_dataframe(
    dataframe=df,
    evaluators=[relevance_evaluator, helpfulness_evaluator],
)

print(results_df.head())
```

## Documentation

- **[Full Documentation](https://arize-phoenix.readthedocs.io/projects/evals/en/latest/index.html)** - Complete API reference and guides
- **[Phoenix Docs](https://arize.com/docs/phoenix)** - Detailed use-cases and examples
- **[OpenInference](https://github.com/Arize-ai/openinference)** - Auto-instrumentation libraries for frameworks

## Community

Join our community to connect with thousands of AI builders:

- 🌍 Join our [Slack community](https://join.slack.com/t/arize-ai/shared_invite/zt-3r07iavnk-ammtATWSlF0pSrd1DsMW7g).
- 📚 Read the [Phoenix documentation](https://arize.com/docs/phoenix).
- 💡 Ask questions and provide feedback in the _#phoenix-support_ channel.
- 🌟 Leave a star on our [GitHub](https://github.com/Arize-ai/phoenix).
- 🐞 Report bugs with [GitHub Issues](https://github.com/Arize-ai/phoenix/issues).
- 𝕏 Follow us on [𝕏](https://twitter.com/ArizePhoenix).
- 🗺️ Check out our [roadmap](https://github.com/orgs/Arize-ai/projects/45) to see where we're heading next.
