Metadata-Version: 2.4
Name: hyperevals
Version: 0.1.0
Summary: Hyperband-optimized parallelized prompt and model parameter tuning for evaluating LLMs
Author-email: Griffin Tarpenning <gtarpenning@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/griffintarpenning/hyperevals
Project-URL: Bug Reports, https://github.com/griffintarpenning/hyperevals/issues
Project-URL: Source, https://github.com/griffintarpenning/hyperevals
Keywords: llm,evaluation,hyperband,optimization,ai,machine-learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml>=6.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.21.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Dynamic: license-file

# HyperEvals

Hyperband-optimized parallelized prompt and model parameter tuning for evaluating LLMs.

## Motivation

Evaluating LLMs is both notoriously challenging and yet critical before confidently deploying in production environments. Seemingly small tweaks in prompts or upgrades to the model can have a significant impact on performance across various tasks, hence the need for carefully crafted evaluations. 

HyperEvals provides hyperband-optimized parallelized prompt and model parameter tuning for evaluating LLMs, inspired by W&B's sweeps combined with hyperband optimization.

## Installation

```bash
pip install hyperevals
```

For development installation:
```bash
git clone https://github.com/griffintarpenning/hyperevals.git
cd hyperevals
pip install -e ".[dev]"
```

## Quick Start

```bash
# Install the package
pip install hyperevals

# Run with a configuration file
hyperevals run config.yaml

# Show version
hyperevals --version
```

## Usage

### MVP Flow
1. Create a CSV dataset
2. Create a prompt template
3. Create an executable Model file
4. Create executable scorers 
5. Create a config file
6. Run the evaluation
7. Iterate on prompt and model parameters
8. Hyperband kills bad optimizations early
9. Final prompt is reported w/ accuracy

### Sample Configuration

```yaml
dataset: /data/test.csv
prompt_template: /prompts/test.txt
model: /models/test.py
scorer: /scorers/scorer.py
max_parallelism: 2  
hyperband:
  min_examples: 10
  bands: [10, 20, 30, 40, 50]
```
