Metadata-Version: 2.4
Name: gaussia
Version: 1.0.0b2
Summary: Gaussia - AI evaluation framework for measuring fairness, quality, and safety of AI models and assistants
Project-URL: Homepage, https://github.com/gaussia-labs/pygaussia
Project-URL: Repository, https://github.com/gaussia-labs/pygaussia.git
Project-URL: Bug Tracker, https://github.com/gaussia-labs/pygaussia/issues
Project-URL: Changelog, https://github.com/gaussia-labs/pygaussia/releases
Author: Gaussia Labs
Maintainer-email: Alex Fiorenza <alexfiorenza2012@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai,assistant,bias,evaluation,fairness,llm,metrics,ml,nlp,toxicity
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: jinja2>=3.1.0
Requires-Dist: langchain-core>=1.2.11
Requires-Dist: langchain>=1.2
Requires-Dist: loguru>=0.7.3
Requires-Dist: optuna>=4.7.0
Requires-Dist: pydantic<3.0.0,>=2.0.0
Requires-Dist: scipy>=1.14.1
Requires-Dist: tqdm>=4.65.0
Requires-Dist: transformers>=4.35.0
Provides-Extra: agentic
Provides-Extra: all
Requires-Dist: accelerate>=0.25.0; extra == 'all'
Requires-Dist: hdbscan>=0.8.41; extra == 'all'
Requires-Dist: interpreto>=0.1.0; extra == 'all'
Requires-Dist: nltk>=3.8.0; extra == 'all'
Requires-Dist: numba>=0.57.0; extra == 'all'
Requires-Dist: numpy>=1.24.0; extra == 'all'
Requires-Dist: optuna>=3.0.0; extra == 'all'
Requires-Dist: pandas>=2.0.0; extra == 'all'
Requires-Dist: scikit-learn<1.8,>=1.3.0; extra == 'all'
Requires-Dist: sentence-transformers>=5.0.0; extra == 'all'
Requires-Dist: torch>=2.0.0; extra == 'all'
Requires-Dist: transformers>=4.35.0; extra == 'all'
Requires-Dist: umap-learn<0.6.0,>=0.5.6; extra == 'all'
Provides-Extra: bestof
Provides-Extra: bias
Requires-Dist: torch>=2.0.0; extra == 'bias'
Provides-Extra: context
Provides-Extra: conversational
Provides-Extra: evalhub
Requires-Dist: accelerate>=0.25.0; extra == 'evalhub'
Requires-Dist: eval-hub-sdk[adapter]<0.2.0,>=0.1.5; extra == 'evalhub'
Requires-Dist: hdbscan>=0.8.41; extra == 'evalhub'
Requires-Dist: nltk>=3.8.0; extra == 'evalhub'
Requires-Dist: numba>=0.57.0; extra == 'evalhub'
Requires-Dist: numpy>=1.24.0; extra == 'evalhub'
Requires-Dist: pandas>=2.0.0; extra == 'evalhub'
Requires-Dist: requests>=2.33.0; extra == 'evalhub'
Requires-Dist: scikit-learn<1.8,>=1.3.0; extra == 'evalhub'
Requires-Dist: sentence-transformers>=5.0.0; extra == 'evalhub'
Requires-Dist: torch>=2.0.0; extra == 'evalhub'
Requires-Dist: umap-learn<0.6.0,>=0.5.6; extra == 'evalhub'
Provides-Extra: explainability
Requires-Dist: interpreto>=0.1.0; extra == 'explainability'
Requires-Dist: torch>=2.0.0; extra == 'explainability'
Requires-Dist: transformers>=4.35.0; extra == 'explainability'
Provides-Extra: generators
Provides-Extra: humanity
Requires-Dist: numpy>=1.24.0; extra == 'humanity'
Requires-Dist: pandas>=2.0.0; extra == 'humanity'
Provides-Extra: metrics
Requires-Dist: accelerate>=0.25.0; extra == 'metrics'
Requires-Dist: hdbscan>=0.8.41; extra == 'metrics'
Requires-Dist: nltk>=3.8.0; extra == 'metrics'
Requires-Dist: numba>=0.57.0; extra == 'metrics'
Requires-Dist: numpy>=1.24.0; extra == 'metrics'
Requires-Dist: pandas>=2.0.0; extra == 'metrics'
Requires-Dist: scikit-learn<1.8,>=1.3.0; extra == 'metrics'
Requires-Dist: sentence-transformers>=5.0.0; extra == 'metrics'
Requires-Dist: torch>=2.0.0; extra == 'metrics'
Requires-Dist: umap-learn<0.6.0,>=0.5.6; extra == 'metrics'
Provides-Extra: prompt-optimizer
Requires-Dist: optuna>=3.0.0; extra == 'prompt-optimizer'
Provides-Extra: regulatory
Requires-Dist: accelerate>=0.25.0; extra == 'regulatory'
Requires-Dist: torch>=2.0.0; extra == 'regulatory'
Provides-Extra: toxicity
Requires-Dist: hdbscan>=0.8.41; extra == 'toxicity'
Requires-Dist: nltk>=3.8.0; extra == 'toxicity'
Requires-Dist: numba>=0.57.0; extra == 'toxicity'
Requires-Dist: numpy>=1.24.0; extra == 'toxicity'
Requires-Dist: pandas>=2.0.0; extra == 'toxicity'
Requires-Dist: scikit-learn<1.8,>=1.3.0; extra == 'toxicity'
Requires-Dist: sentence-transformers>=5.0.0; extra == 'toxicity'
Requires-Dist: torch>=2.0.0; extra == 'toxicity'
Requires-Dist: umap-learn<0.6.0,>=0.5.6; extra == 'toxicity'
Provides-Extra: vision
Requires-Dist: numpy>=1.24.0; extra == 'vision'
Requires-Dist: sentence-transformers>=5.0.0; extra == 'vision'
Requires-Dist: torch>=2.0.0; extra == 'vision'
Description-Content-Type: text/markdown

# Gaussia

[![PyPI version](https://img.shields.io/pypi/v/gaussia)](https://pypi.org/project/gaussia/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/gaussia)](https://pypi.org/project/gaussia/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/gaussia)](https://pypi.org/project/gaussia/)
[![PyPI - License](https://img.shields.io/pypi/l/gaussia)](https://pypi.org/project/gaussia/)

AI evaluation framework for measuring fairness, quality, and safety of AI models and assistants.

## Installation

```bash
pip install gaussia
```

With specific metric dependencies:

```bash
pip install gaussia[toxicity]    # Toxicity analysis
pip install gaussia[bias]        # Bias detection
pip install gaussia[evalhub]     # EvalHub provider adapter
pip install gaussia[metrics]     # All metrics
pip install gaussia[all]         # Everything
```

## Quick Start

```python
from gaussia import Retriever, Dataset, Batch
from gaussia.metrics import Context

# 1. Define your data source
class MyRetriever(Retriever):
    def load_dataset(self) -> list[Dataset]:
        return [
            Dataset(
                session_id="session-1",
                assistant_id="assistant-1",
                language="en",
                context="France is a country in Western Europe.",
                conversation=[
                    Batch(
                        qa_id="q1",
                        query="Where is France?",
                        assistant="France is located in Western Europe.",
                        ground_truth_assistant="France is a country in Western Europe.",
                    )
                ],
            )
        ]

# 2. Run a metric
metrics = Context.run(retriever=MyRetriever())
```

## Metrics

| Metric | Description | Install extra |
|--------|-------------|---------------|
| **Context** | Evaluates response alignment with provided context | — |
| **Conversational** | Dialogue quality via Grice's maxims (memory, language, quality, quantity, relation, manner) | — |
| **BestOf** | King-of-the-hill tournament comparison of multiple assistants | — |
| **Agentic** | Agent evaluation with pass@K and tool correctness | — |
| **Toxicity** | Cluster-based toxicity profiling with demographic and sentiment analysis | `[toxicity]` |
| **Bias** | Bias detection across protected attributes using guardians | `[bias]` |
| **Humanity** | Emotion, empathy, and human-like quality analysis | `[humanity]` |
| **Regulatory** | Compliance evaluation against regulatory documents | `[regulatory]` |
| **VisionSimilarity** | VLM description comparison via semantic similarity | `[vision]` |
| **VisionHallucination** | Hallucination detection in VLM outputs | `[vision]` |

## Features

### Guardians

Pluggable bias detection backends:

```python
from gaussia.guardians import IBMGraniteGuardian, LLamaGuardGuardian

metrics = Bias.run(retriever=MyRetriever(), guardian=IBMGraniteGuardian())
```

### Statistical Modes

Choose between frequentist and Bayesian aggregation:

```python
from gaussia import FrequentistMode, BayesianMode

metrics = Context.run(retriever=MyRetriever(), statistical_mode=FrequentistMode())
metrics = Context.run(retriever=MyRetriever(), statistical_mode=BayesianMode())
```

### Synthetic Data Generation

Generate evaluation datasets from documents:

```python
from gaussia.generators import BaseGenerator, create_markdown_loader

loader = create_markdown_loader(path="./docs")
generator = BaseGenerator(context_loader=loader)
datasets = generator.generate()
```

### Explainability

Token-level attribution analysis:

```python
from gaussia.explainability import AttributionExplainer

explainer = AttributionExplainer(method="lime")
attributions = explainer.explain(text="Your input text")
```

### Prompt Optimization

Optimize prompts using evolutionary and multi-objective strategies:

```python
from gaussia.prompt_optimizer import GEPAOptimizer, MIPROv2Optimizer
```

### EvalHub Provider

Run Gaussia as an EvalHub BYOF provider:

```bash
python -m gaussia.integrations.evalhub.adapter
```

## Documentation

Full documentation available at [docs.gaussia.ai](https://docs.gaussia.ai).

## Requirements

- Python >= 3.11

## License

MIT
