Metadata-Version: 2.4
Name: crowd-eval
Version: 0.1.3
Summary: Human evaluation tools for AI models and datasets
Requires-Python: >=3.11
Requires-Dist: rapidata>=2.27.0
Requires-Dist: wandb>=0.20.0
Description-Content-Type: text/markdown

# Crowd Evaluation for Machine Learning Training

A Python library for integrating crowd evaluation into your machine learning training loops. This library provides asynchronous, non-blocking evaluation of model outputs (currently supporting image generation) with automatic logging to Weights & Biases (wandb).

## Features

- **Asynchronous Evaluation**: Evaluations run in the background without blocking your training loop
- **Wandb Integration**: Results are automatically logged to your wandb runs with proper ordering
- **Image Evaluation**: Built-in support for evaluating generated images on multiple criteria
- **Crowd-in-the-Loop**: Uses [Rapidata](https://rapidata.ai/) for high-quality crowd evaluation
- **Easy Integration**: Add evaluation to your training loop with just a few lines of code

## Quick Start

```python
import wandb
from checkpoint_evaluation.image_checkpoint_evaluator import ImageEvaluator

# Initialize wandb
run = wandb.init(project="my-project")

# Create evaluator
evaluator = ImageEvaluator(wandb_run=run, model_name="my-model")

# In your training loop
for step in range(100):
    # ... your training code ...
    
    # Generate or load validation images (every N steps)
    if step % 10 == 0:
        validation_images = ["path/to/image_1.png", "path/to/image_2.png"]
        
        # Fire-and-forget evaluation - returns immediately!
        evaluator.evaluate(validation_images)
    
    # ... continue training ...

# Wait for all evaluations to complete before finishing
evaluator.wait_for_all_evaluations()
run.finish()
```

## Installation

### Prerequisites

- Python 3.9+
- A [Rapidata](https://rapidata.ai/) account with API credentials
- A [Weights & Biases](https://wandb.ai/) account

### Dependencies

#### Prerequisites
Install uv if you haven't already:
```bash
# For MacOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# For Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```

#### Setup Instructions

1. Create and activate a virtual environment:
    ```bash
    uv venv

    # On Unix/macOS
    source .venv/bin/activate

    # On Windows
    .venv\Scripts\activate
    ```
2. Install dependencies:
    ```bash
    uv sync
    ```

### Environment Setup

Create a `.env` file in your project root:

```env
OPENAI_API_KEY=your_openai_api_key  # If running the example file
RAPIDATA_CLIENT_ID=your_rapidata_client_id # If running on a server
RAPIDATA_CLIENT_SECRET=your_rapidata_client_secret # If running on a server
```

## Detailed Usage

### Image Evaluation

The `ImageEvaluator` evaluates generated images on three key metrics:

1. **Preference**: Overall crowd preference for the image
2. **Alignment**: How well the image matches its text description
3. **Coherence**: Visual quality and absence of artifacts

### Image Requirements

For the evaluator to function properly, your image files should adhere to the following naming convention: the image name must end with "_{prompt_id}". The rest of the filename structure is not significant.

Where `{prompt_id}` corresponds to prompt IDs from the evaluation dataset. The evaluator will automatically validate that your images match available prompts.

### Complete Example with Image Generation

#### To run this, make sure you run the following commands:
```bash
uv venv
source .venv/bin/activate
uv sync
uv add openai dotenv
```

and log in to wandb:
```bash
wandb login
```

```python
import os
import sys
import openai
import requests
import wandb
from checkpoint_evaluation.image_checkpoint_evaluator import ImageEvaluator
from dotenv import load_dotenv

load_dotenv()

# Setup
openai.api_key = os.getenv("OPENAI_API_KEY")
run = wandb.init(project="dalle-evaluation")
evaluator = ImageEvaluator(wandb_run=run, model_name="dalle-3")

def generate_and_save_image(prompt: str, file_location: str) -> str:
    """Generate image using DALL-E and save to disk."""
    os.makedirs(os.path.dirname(file_location), exist_ok=True)
    
    response = openai.images.generate(
        model="dall-e-3",
        prompt=prompt,
        size="1024x1024",
        quality="standard",
        n=1
    )
    
    # Download and save image
    image_url = response.data[0].url
    image_data = requests.get(image_url).content
    with open(file_location, 'wb') as f:
        f.write(image_data)
    
    return file_location

if __name__ == "__main__":
    # Training simulation
    for step in range(3):
        # Simulate training
        run.log({"Some training metric": step})
        
        # Generate images for evaluation (using first 2 prompts)
        validation_images = [
            generate_and_save_image(prompt, f"validation_images/generated_image_run_{step}_{id}.png")
            for id, prompt in list(evaluator.prompts.items())[:2]
        ]
        
        # Evaluate asynchronously
        evaluator.evaluate(validation_images)

    print("This will run immediately, but the evaluations will run in the background.")

    # Wait for all evaluations
    evaluator.wait_for_all_evaluations()
    run.finish()
```

## Troubleshooting

### Common Issues

**"Invalid prompt ids" error:**
- Ensure image filenames follow the pattern: `*_{prompt_id}.png`
- Check that `{prompt_id}` exists in the evaluation dataset

**Evaluations not appearing in wandb:**
- Call `evaluator.wait_for_all_evaluations()` before `run.finish()`
- Check your Rapidata API credentials
- Verify internet connectivity for API calls

**"Module not found" error:**
- Ensure you have the correct dependencies installed
- Ensure your example code is run from the root of the repository

### Environment Variables

Required:
- `RAPIDATA_CLIENT_ID`: Your Rapidata client ID (Not required if running locally)
- `RAPIDATA_CLIENT_SECRET`: Your Rapidata client secret (Not required if running locally)

Optional:
- `OPENAI_API_KEY`: For image generation examples
