Metadata-Version: 2.4
Name: assert-ai
Version: 0.1.0
Summary: YAML-driven safety evaluation pipeline with LiteLLM-backed stages
Author: Microsoft Responsible AI
License: MIT License
        
        Copyright (c) Microsoft Corporation.
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/responsibleai/ASSERT
Project-URL: Repository, https://github.com/responsibleai/ASSERT
Project-URL: Issues, https://github.com/responsibleai/ASSERT/issues
Project-URL: Documentation, https://github.com/responsibleai/ASSERT#readme
Keywords: safety,evaluation,llm,agent,responsible-ai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Requires-Dist: litellm>=1.79.1
Requires-Dist: matplotlib>=3.10.8
Requires-Dist: opentelemetry-api>=1.39.0
Requires-Dist: opentelemetry-sdk>=1.39.0
Requires-Dist: pydantic>=2.0
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: PyYAML>=6.0
Requires-Dist: rapidfuzz>=3.9.0
Requires-Dist: rich>=13.7.0
Provides-Extra: otel
Requires-Dist: arize-phoenix>=15.0.0; extra == "otel"
Requires-Dist: arize-phoenix-otel>=0.15.0; extra == "otel"
Requires-Dist: openinference-instrumentation-langchain>=0.1.62; extra == "otel"
Provides-Extra: langgraph
Requires-Dist: langchain-core>=1.3.3; extra == "langgraph"
Requires-Dist: langchain-openai>=1.1.14; extra == "langgraph"
Requires-Dist: langgraph>=1.1.8; extra == "langgraph"
Provides-Extra: dspy
Requires-Dist: dspy-ai<3,>=2.7; extra == "dspy"
Provides-Extra: analysis
Requires-Dist: markdown-it-py>=3.0.0; extra == "analysis"
Requires-Dist: numpy>=2.0.0; extra == "analysis"
Requires-Dist: openai>=1.30.0; extra == "analysis"
Requires-Dist: openpyxl>=3.1.5; extra == "analysis"
Requires-Dist: pandas>=2.2.0; extra == "analysis"
Requires-Dist: requests>=2.33.0; extra == "analysis"
Requires-Dist: scikit-learn>=1.5.0; extra == "analysis"
Requires-Dist: seaborn>=0.13.0; extra == "analysis"
Requires-Dist: sentence-transformers>=3.0.0; extra == "analysis"
Provides-Extra: regression
Requires-Dist: scipy>=1.11.0; extra == "regression"
Requires-Dist: numpy>=2.0.0; extra == "regression"
Provides-Extra: examples
Requires-Dist: autogen-agentchat>=0.7.5; extra == "examples"
Requires-Dist: autogen-ext>=0.7.5; extra == "examples"
Requires-Dist: crewai[azure-ai-inference]>=1.6.1; extra == "examples"
Requires-Dist: dspy>=2.6.13; extra == "examples"
Requires-Dist: haystack-ai>=2.28.0; extra == "examples"
Requires-Dist: instructor>=1.15.1; extra == "examples"
Requires-Dist: langchain-mcp-adapters>=0.2.2; extra == "examples"
Requires-Dist: llama-index>=0.14.21; extra == "examples"
Requires-Dist: llama-index-core>=0.14.21; extra == "examples"
Requires-Dist: llama-index-llms-openai>=0.7.5; extra == "examples"
Requires-Dist: openai-agents>=0.14.5; extra == "examples"
Requires-Dist: pydantic-ai>=0.8.1; extra == "examples"
Requires-Dist: smolagents>=1.22.0; extra == "examples"
Requires-Dist: openinference-instrumentation-openai>=0.1.45; extra == "examples"
Requires-Dist: openinference-instrumentation-litellm>=0.1.30; extra == "examples"
Requires-Dist: openinference-instrumentation-dspy<0.1.20,>=0.1.19; extra == "examples"
Requires-Dist: openinference-instrumentation-crewai<0.1.18,>=0.1.17; extra == "examples"
Provides-Extra: all
Requires-Dist: assert-ai[analysis,examples,langgraph,otel,regression]; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=9.0.3; extra == "dev"
Requires-Dist: pytest-timeout>=2.2.0; extra == "dev"
Dynamic: license-file

<h1 align="center">
        <img src="assets/assert-logo.png" alt="ASSERT logo" width="22" style="vertical-align: middle; margin-right: 5px;"/>
        <span style="vertical-align: middle; font-family: 'Spline Sans Mono', monospace;">ASSERT.</span>
</h1>
<p align="center">
        Adaptive Spec-driven Scoring for Evaluation and Regression Testing<br/>
        Local-first. Framework-agnostic. Trace-aware.
</p>
<p align="center">
        <a href="docs/getting-started.md">🚀 Get started</a> |
        <a href="docs/targets/callable.md">🔌 View supported targets</a> |
        <a href="docs/cli/overview.md">📘 CLI Reference</a> |
        <a href="examples/README.md">🧪 Examples</a>
</p>
<p align="center">
        <a href="https://github.com/responsibleai/ASSERT/actions/workflows/build.yml">
                <img src="https://github.com/responsibleai/ASSERT/actions/workflows/build.yml/badge.svg" alt="Build status">
        </a>
        <a href="https://www.python.org/downloads/" target="_blank">
                <img src="https://img.shields.io/badge/python-3.11%20%7C%203.12%20%7C%203.13-blue.svg" alt="Python 3.11 | 3.12 | 3.13">
        </a>
        <a href="LICENSE">
                <img src="https://img.shields.io/github/license/responsibleai/ASSERT" alt="License">
        </a>
</p>
<p align="center">
        <img src="assets/assert-ai-framework-diagram.png" alt="Diagram of the ASSERT evaluation framework" width="100%">
</p>

## Why ASSERT?

Most AI systems start with a specification: product requirements, policies, system prompts, or launch criteria describing what the system should and should not do.

But evaluation often starts elsewhere: generic scorers, predefined benchmarks, or manual test cases that drift from the original intent.

ASSERT closes that gap. It turns your specified behaviors in natural language into structured, executable evaluations that can be reviewed, run, scored, and improved over time.

From the natural language specification, the ASSERT pipeline derives behavior categories, generates single-turn and multi-turn test cases, inferences them against your target, and uses an LLM judge to score each conversation against your policies.

## What you get with ASSERT

- **Spec-driven coverage** - test cases are generated from your product requirements and context, not a generic benchmark. You specify the behaviors that you want to test for
- **Test any model endpoint** via integrations with [LiteLLM](https://github.com/BerriAI/litellm), supporting 100+ model endpoints from platform providers such as Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM.
- **Test any agent or multi-agent system** via integrations with [OpenInference](https://github.com/Arize-ai/openinference/). Evaluate a LangGraph agent, a CrewAI / OpenAI Agents SDK / DSPy / LlamaIndex / AutoGen system, custom multi-agent orchestration, a Python callable, or a hosted model — without rewriting the evaluation orchestration pipeline.
- **Agent trace-grounded judgment** - the recommended integration captures OpenTelemetry spans (Phoenix/OpenInference auto-instruments 33+ frameworks in two lines, or you can emit your own with the OTel SDK) so the judge can cite tool calls, routing, model calls, and latency as evidence — not just the final response.
- **Portable artifacts** - every stage writes JSON/JSONL files locally for inspection, CI, and sharing.
- **Bundled local viewer** - browse runs side-by-side, pin a baseline, drill into per-behavior dimension breakdowns, and read judge justifications cited against the captured traces.

## Get started

### Quick install

```bash
pip install -e ".[otel,langgraph]"       # install
cp .env.example .env                     # add your provider key
assert-ai run --config examples/travel_planner_langgraph/eval_config.yaml
```

<table align="center" style="width: 100%; border: 1px solid #d0d7de; border-collapse: collapse;">
        <tr>
                <th style="border: 1px solid #d0d7de; padding: 10px; text-align: left;">🌐 Project website ↗</th>
                <th style="border: 1px solid #d0d7de; padding: 10px; text-align: left;">📝 Technical blog ↗</th>
                <th style="border: 1px solid #d0d7de; padding: 10px; text-align: left;">🚀 Quickstart guide ↗</th>
                <th style="border: 1px solid #d0d7de; padding: 10px; text-align: left;">📚 Documentation ↗</th>
        </tr>
        <tr>
                <td style="border: 1px solid #d0d7de; padding: 10px;"><a href="https://aka.ms/assert-ghpage">Learn about ASSERT</a></td>
                <td style="border: 1px solid #d0d7de; padding: 10px;"><a href="https://aka.ms/assert">Read the Command Line post</a></td>
                <td style="border: 1px solid #d0d7de; padding: 10px;"><a href="docs/getting-started.md">Follow the full walkthrough</a></td>
                <td style="border: 1px solid #d0d7de; padding: 10px;"><a href="https://aka.ms/assert-docs">Browse concepts and guides</a></td>
        </tr>
</table>

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos is subject to those third party's policies.

## Telemetry

This project does not collect or send telemetry to Microsoft by default. Runs write local artifacts under `artifacts/results/`, and optional OpenTelemetry trace capture is controlled by your configuration and local collector setup, such as Phoenix.

If you configure a target, judge, trace collector, or model provider to send data to an external service, the prompts, responses, traces, metadata, and other evaluation artifacts sent to that service are governed by that service's terms and your configuration.

## Disclaimer: Risks and limitations of ASSERT

See the full section in the [`Concept Doc`](docs/concepts.md#risks-and-limitations).
