Metadata-Version: 2.1
Name: promptstability
Version: 0.1.5
Summary: Package for generating Prompt Stability Score (PSS). PSS estimates the stability of outcomes resulting from variations in language model prompt specifications.
Home-page: https://github.com/palaiole13/promptstability
License: Apache-2.0
Author: Christopher Barrie, Elli Palaiologou, Petter Törnberg
Requires-Python: >=3.8,<3.11
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Dist: matplotlib (>=3.7.2,<4.0.0)
Requires-Dist: numpy (>=1.22.0,<2.0.0)
Requires-Dist: openai (>=1.63.0,<2.0.0)
Requires-Dist: pandas (>=1.5.3,<2.0.0)
Requires-Dist: seaborn (>=0.12.2,<0.13.0)
Requires-Dist: sentence-transformers (>=2.2.2,<3.0.0)
Requires-Dist: sentencepiece (>=0.1.99,<0.2.0)
Requires-Dist: simpledorff (==0.0.2)
Requires-Dist: torch (>=1.13.1,<2.0.0)
Requires-Dist: transformers (>=4.35.0,<5.0.0)
Project-URL: Documentation, https://github.com/palaiole13/promptstability
Project-URL: Repository, https://github.com/palaiole13/promptstability
Description-Content-Type: text/markdown

# promptstability

[![PyPI](https://img.shields.io/pypi/v/promptstability.svg)](https://pypi.org/project/promptstability/)
[![Tests](https://github.com/palaiole13/promptstability/actions/workflows/test.yml/badge.svg)](https://github.com/palaiole13/promptstability/actions/workflows/test.yml)
[![Changelog](https://img.shields.io/github/v/release/palaiole13/promptstability?include_prereleases&label=changelog)](https://github.com/palaiole13/promptstability/releases)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/palaiole13/promptstability/blob/main/LICENSE)

Package for generating Prompt Stability Scores (PSS). See paper [here](https://www.arxiv.org/abs/2407.02039) outlining technique for investigating the stability of outcomes resulting from variations in language model prompt specifications. Replication material [here](https://github.com/cjbarrie/promptstability/tree/main).

The current library supports both:
- cumulative intra-PSS, where repeated runs are accumulated over time
- adjacent intra-PSS, where each run is compared with the immediately previous run

It also supports post hoc rescoring from saved annotation tables, which is useful when you want to recompute stability summaries without rerunning the model.

## Table of Contents

- [Requirements](#requirements)
- [Installation](#installation)
- [Example Usage](#example-usage)
- [API Documentation](#api-documentation)
- [Development](#development)

## Requirements

- **Python 3.8 to 3.10** (Python 3.11 and above are not supported due to dependency limitations)
- Other dependencies are installed automatically via `pip`

## Installation

Install this library using `pip`:
```bash
pip install promptstability
```
## Example Usage
Here we provide instructions for using `promptstability` with OpenAI and Ollama.

``` python
import pandas as pd
from promptstability.core import get_api_key
from promptstability.core import PromptStabilityAnalysis
from promptstability.core import load_example_data
import os

# Load data (news articles)
df = load_example_data()
print(df.head())
example_data = list(df['body'].values) # Take a subsample

# Define the prompt texts
original_text = 'The following are some news articles about the economy.'
prompt_postfix = 'Respond 0 for positive news, or 1 for negative news. Guess if you do not know. Respond nothing else.'
```
#### a) OpenAI Example (e.g., GPT-4o-mini)
```python
from openai import OpenAI

# Initialize OpenAI client
# First set the OPENAI_API_KEY environment variable
APIKEY = get_api_key('openai')
client = OpenAI(api_key=APIKEY)

OPENAI_MODEL = 'gpt-4o-mini'

# Define the OpenAI annotation function
def annotate_openai(text, prompt, temperature=0.1):
    try:
        response = client.chat.completions.create(
            model=OPENAI_MODEL,
            temperature=temperature,
            messages=[
                {"role": "system", "content": prompt},
                {"role": "user", "content": text}
            ]
        )
    except Exception as e:
        print(f"OpenAI exception: {e}")
        raise e

    return ''.join(choice.message.content for choice in response.choices)

# Instantiate the analysis class using OpenAI’s annotation function (Note on warnings: Pegasus comes with automated warning about model weights, which you can ignore)
psa_openai = PromptStabilityAnalysis(annotation_function=annotate_openai, data=example_data)

# Run intra-prompt stability analysis using the method `intra_pss`
print("Running OpenAI intra-prompt analysis...")
ka_openai_intra, annotated_openai_intra = psa_openai.intra_pss(
    original_text,
    prompt_postfix,
    iterations=5,   # minimal iterations
    plot=True,
    save_path='news_intra.png',
    save_csv="news_intra.csv"
)
print("OpenAI intra-prompt KA scores:", ka_openai_intra)

# Optional: compute both cumulative and adjacent intra-PSS plus summary diagnostics
score_map, rescored_annotations, intra_summaries = psa_openai.intra_pss(
    original_text,
    prompt_postfix,
    iterations=5,
    analysis_modes=["cumulative_alpha", "adjacent_alpha"],
    return_summaries=True
)
print("Intra summaries:", intra_summaries)

# Run inter-prompt stability analysis using the method `inter_pss`
print("Running OpenAI inter-prompt analysis...")
temperatures = [0.1, 0.5, 2.0] # in practice, you would set more temperatures than this
ka_openai_inter, annotated_openai_inter = psa_openai.inter_pss(
    original_text,
    prompt_postfix,
    nr_variations=3,
    temperatures=temperatures,
    iterations=1,
    plot=True,
    save_path='news_inter.png',
    save_csv="news_inter.csv"
)
print("OpenAI inter-prompt KA scores:", ka_openai_inter)
print("Inter summaries:", psa_openai.summarize_inter_scores(ka_openai_inter))
```

#### b) Ollama Example (e.g., your local deepseek-r1:8b)
``` python
import ollama

# Make sure that your Ollama server is running locally and that 'deepseek-r1:8b' is available.
OLLAMA_MODEL = 'deepseek-r1:8b'

# Define the Ollama annotation function
def annotate_ollama(text, prompt, temperature=0.1):
    try:
        response = ollama.chat(model=OLLAMA_MODEL, messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": text}
        ])
    except Exception as e:
        print(f"Ollama exception: {e}")
        raise e
    return response['message']['content']

# Instantiate the analysis class using Ollama’s annotation function (Note on warnings: Pegasus comes with automated warning about model weights, which you can ignore)
psa_ollama = PromptStabilityAnalysis(annotation_function=annotate_ollama, data=example_data)

# Run intra-prompt stability analysis using the method `intra_pss`
print("Running Ollama intra-prompt analysis...")
ka_ollama_intra, annotated_ollama_intra = psa_ollama.intra_pss(
    original_text,
    prompt_postfix,
    iterations=5,
    plot=False
)
print("Ollama intra-prompt KA scores:", ka_ollama_intra)

# Run inter-prompt stability analysis using the method `inter_pss`
temperatures = [0.1, 2.0, 5.0]  # or whichever temperatures you want to test
print("Running Ollama inter-prompt analysis...")
ka_ollama_inter, annotated_ollama_inter = psa_ollama.inter_pss(
    original_text,
    prompt_postfix,
    nr_variations=3,
    temperatures=temperatures,
    iterations=1,
    plot=False
)
print("Ollama inter-prompt KA scores:", ka_ollama_inter)
```

### Post hoc rescoring from saved annotations

If you already have a long-format annotation table with ``id``, ``annotation``, and
``iteration`` columns, you can rescore it directly:

```python
rescored_map, rescored_df = psa_ollama.score_intra_annotations(
    annotated_ollama_intra,
    bootstrap_samples=100,
    analysis_modes=["cumulative_alpha", "adjacent_alpha"]
)

print(psa_ollama.summarize_intra_scores(rescored_map))
```
## API Documentation
Our full API reference documentation is hosted on Read the Docs and includes detailed information on all modules, classes, and functions.

You can access the documentation here:

[PromptStability API Documentation](https://promptstability.readthedocs.io)

*This documentation is automatically updated whenever changes are pushed to the repository.*

## Development

To contribute to this library, first checkout the code. Then create a new virtual environment:
```bash
cd promptstability
python -m venv venv
source venv/bin/activate
```
Now install the dependencies and test dependencies:
```bash
pip install -e '.[test]'
```
To run the tests:
```bash
pytest
```

