Metadata-Version: 2.4
Name: syncreus-eval
Version: 0.1.0
Summary: Open-source AI evaluation toolkit — hallucination detection, safety, industry-specific evals
Project-URL: Homepage, https://syncreus.com
Project-URL: Repository, https://github.com/syncreus/syncreus-eval
Project-URL: Documentation, https://docs.syncreus.com/eval-sdk
Project-URL: Issues, https://github.com/syncreus/syncreus-eval/issues
Author-email: Syncreus <hello@syncreus.com>
License: MIT
License-File: LICENSE
Keywords: ai,evaluation,hallucination,llm,safety,testing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: google-genai>=1.0.0
Requires-Dist: pydantic>=2.0
Provides-Extra: accuracy
Requires-Dist: fastembed>=0.4; extra == 'accuracy'
Requires-Dist: numpy>=1.24; extra == 'accuracy'
Provides-Extra: all
Requires-Dist: fastembed>=0.4; extra == 'all'
Requires-Dist: httpx>=0.27; extra == 'all'
Requires-Dist: llm-guard>=0.3; extra == 'all'
Requires-Dist: numpy>=1.24; extra == 'all'
Requires-Dist: presidio-analyzer>=2.2; extra == 'all'
Requires-Dist: spacy>=3.7; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Provides-Extra: prompt-injection
Requires-Dist: llm-guard>=0.3; extra == 'prompt-injection'
Provides-Extra: safety
Requires-Dist: presidio-analyzer>=2.2; extra == 'safety'
Requires-Dist: spacy>=3.7; extra == 'safety'
Provides-Extra: upload
Requires-Dist: httpx>=0.27; extra == 'upload'
Description-Content-Type: text/markdown

# syncreus-eval

Open-source AI evaluation toolkit for hallucination detection, safety scanning, bias analysis, and industry-specific evals. Runs locally without the Syncreus platform.

## Installation

```bash
# Core (LLM-as-judge evaluators via Gemini)
pip install syncreus-eval

# With optional extras
pip install syncreus-eval[accuracy]          # fastembed for semantic similarity
pip install syncreus-eval[safety]            # Presidio PII scanning
pip install syncreus-eval[prompt-injection]  # LLM Guard injection detection
pip install syncreus-eval[upload]            # Upload results to Syncreus platform
pip install syncreus-eval[all]               # Everything
```

## Quick Start

```python
from syncreus_eval import evaluate, EvalType

# Hallucination detection
result = evaluate(
    EvalType.HALLUCINATION,
    ai_input="The Eiffel Tower is in Paris, France. It was built in 1889.",
    ai_output="The Eiffel Tower is in Paris, France. It was built in 1889 and is 300 meters tall.",
    gemini_key="your-gemini-api-key",
)
print(result.passed)    # True/False/None
print(result.details)   # Claim-level verdicts

# Performance tracking (no LLM needed)
result = evaluate(
    EvalType.PERFORMANCE,
    trace={"latency_ms": 150, "token_count_input": 100, "token_count_output": 50},
)
print(result.details)   # {"latency_ms": 150, "total_tokens": 150, ...}

# Run multiple evals at once
results = evaluate(
    [EvalType.HALLUCINATION, EvalType.IDEOLOGY],
    ai_input="Context here",
    ai_output="Response here",
    gemini_key="your-key",
)
for r in results:
    print(f"{r.eval_type.value}: passed={r.passed}")
```

## Evaluation Types

### General Purpose
| Type | Description | Requires |
|------|-------------|----------|
| `HALLUCINATION` | Detects unsupported factual claims | Gemini API key |
| `ACCURACY` | Golden dataset comparison via semantic similarity | `[accuracy]` extra |
| `CONSISTENCY` | Pairwise similarity across repeated prompts | `[accuracy]` extra |
| `PERFORMANCE` | Extracts latency, tokens, cost metrics | Nothing |
| `AGENT_TASK` | Verifies agent completion claim honesty | Gemini API key |
| `REGRESSION` | Baseline comparison (platform only) | Syncreus platform |

### Safety & Compliance
| Type | Description | Requires |
|------|-------------|----------|
| `SAFETY` | PII/sensitive data detection + content safety | `[safety]` extra |
| `BIAS` | Demographic parity / EEOC four-fifths rule | Nothing |
| `IDEOLOGY` | Political neutrality (OMB M-26-04) | Gemini API key |
| `PROMPT_INJECTION` | Injection attempt detection | `[prompt-injection]` extra |

### Industry-Specific
| Type | Description | Requires |
|------|-------------|----------|
| `HEALTHCARE` | Medical accuracy, drug safety, PHI detection | Gemini API key |
| `LEGAL` | Citation validity, holding fidelity | Gemini API key |
| `FINANCE` | Regulatory accuracy, numerical precision | Gemini API key |
| `CODE_ACCURACY` | API existence, function signatures | Gemini API key |

## API Reference

### `evaluate()`

```python
from syncreus_eval import evaluate, EvalType

result = evaluate(
    eval_type=EvalType.HALLUCINATION,  # or a list of types
    ai_input="...",
    ai_output="...",
    gemini_key="...",                  # or set GEMINI_API_KEY env var
    # Accuracy-specific:
    test_cases=[{"input_text": "...", "expected_output": "..."}],
    threshold=0.85,
    # Performance-specific:
    trace={"latency_ms": 100, ...},
    # Agent task-specific:
    verification_result="exit code 0",
    # Bias-specific:
    traces=[{"metadata": {"demographic_group": "A"}, "passed": True}],
    # Consistency-specific:
    outputs=["response1", "response2", "response3"],
    # Safety-specific:
    entity_whitelist=["aspirin"],
    enable_gemini_content_safety=True,
)
```

Returns an `EvalResult` (or list of them):

```python
class EvalResult:
    eval_type: EvalType
    passed: bool | None      # True/False/None (None = error or skipped)
    score: float | None       # Numeric score where applicable
    details: dict[str, Any]   # Evaluator-specific details
    error: bool               # Whether an error occurred
    error_message: str | None # Error description
```

### `upload_results()` (optional)

```python
from syncreus_eval import upload_results

upload_results(
    results=result,           # EvalResult or list
    api_key="syn_...",        # Syncreus API key
    endpoint="https://api.syncreus.com",
    trace_id="trace-123",     # optional
)
```

Requires: `pip install syncreus-eval[upload]`

## Environment Variables

| Variable | Description |
|----------|-------------|
| `GEMINI_API_KEY` | Google Gemini API key for LLM-as-judge evaluators |

## License

MIT
