Metadata-Version: 2.4
Name: hyperplane-eval
Version: 0.1.5
Summary: A modular framework for evaluating and verifying agentic LLM outputs.
Author: Marten Panchev
Author-email: marten@aquithm.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.10.0
Requires-Dist: litellm>=1.0.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: scikit-learn>=1.2.0
Requires-Dist: openai>=1.0.0
Requires-Dist: pyngrok>=7.1.0
Requires-Dist: rich>=13.0.0
Requires-Dist: questionary>=2.0.0
Requires-Dist: PyYAML>=6.0.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Hyperplane Eval

Hyperplane Eval is a Python-based testing framework that helps you figure out exactly when and where your AI agents break. Instead of writing manual test cases, you give Hyperplane a target function and a set of rules, and it systematically generates edge-cases to map out your agent's "Safe Polytope" — the operational volume where your agent is reliable.

## 🚀 How It Works: Breadth-First Evaluation

Testing an AI agent is hard because the potential input space is infinite. Hyperplane solves this by breaking down inputs into "dimensions" of complexity (e.g., Urgency, Ambiguity, Formatting). 

Instead of randomly guessing inputs, Hyperplane uses a **breadth-first evaluation** approach:
1. **Dimension Extraction:** It automatically extracts relevant dimensions based on the rules you want to test.
2. **Grid Generation:** It generates a uniform grid of test scenarios across these dimensions (using Sobol sequences for perfectly even distribution).
3. **Input Synthesis:** It uses a strong LLM to generate realistic user inputs that match those specific dimension coordinates.
4. **Evaluation:** It executes your local agent code with the generated inputs, and evaluates the output against your rules using a Chain-of-Thought (CoT) judge.

By doing this breadth-first scan across multiple dimensions simultaneously, Hyperplane creates a mathematical map of your agent's reliability and calculates its "Reliability Coverage" as a clear, comparable percentage.

## 🚦 CLI Integration

Hyperplane is incredibly easy to use. You don't need to write any complex evaluation scripts or boilerplate code; everything is handled through an interactive CLI.

### Setup & Installation

Install the framework via pip:

```bash
pip install hyperplane-eval
```

### Running the CLI

Run the interactive CLI directly in your terminal from inside your project directory:

```bash
hyperplane
```

The wizard will immediately guide you through the evaluation setup:
1. **Target Selection:** It will automatically scan your local Python files and let you pick the function that acts as your agent's entry point.
2. **Rule Definition:** You define the rules your agent must follow in plain English (e.g., "Never offer a refund over $50").
3. **Configuration:** You configure the depth (how many points to test) and breadth (how many dimensions to extract).
4. **Execution:** The framework will spin up workers, generate the test space, execute your local code, and render a real-time terminal dashboard.

Once complete, Hyperplane generates an interactive HTML report showing exactly which dimensions cause your agent to fail, allowing you to easily identify blind spots in your system prompts.

## 🛠 Technology Stack
- **Language:** Python 3.10+
- **Data Modeling:** `pydantic`
- **Math/Geometry:** `numpy`, `scipy` (Sobol sequences, ConvexHull analysis)
- **LLM Integration:** `litellm` for universal API connectivity (OpenAI, Gemini, Anthropic, or any local vLLM).

## 📄 License

This project is licensed under the Apache License, Version 2.0.
See the [LICENSE](LICENSE) file for more information.
