Metadata-Version: 2.4
Name: optimus-jbscorer
Version: 0.0.3
Summary: Optimus: A semantic and harmfulness-based metric for evaluating LLM jailbreak prompts
Author-email: Ismail Hossain <ihossain@miners.utep.edu>
License-Expression: MIT
Project-URL: Homepage, https://github.com/ismail102/optimus-pypi
Keywords: LLM,jailbreak,AI safety,adversarial prompts,robust evaluation,semantic similarity
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch
Requires-Dist: transformers
Requires-Dist: sentence-transformers
Requires-Dist: numpy

# Optimus: Semantic–Harmfulness-Based Jailbreak Scoring

## Overview

This repository provides an implementation of **Optimus**, a continuous metric for evaluating jailbreak prompts in large language models. The metric jointly considers **semantic similarity** to a harmful target intent and the **estimated harmfulness** of the prompt content.

Unlike binary jailbreak success metrics such as **Attack Success Rate (ASR)**, Optimus produces a real-valued score in the range **[0, 1]**. This enables finer-grained evaluation by penalizing trivial paraphrases, benign rewrites, and low-risk prompts, while highlighting prompts that are both semantically aligned with harmful intent and likely to induce unsafe behavior.

The core implementation is provided through the `JBScoreCalculator` class.

---

## Key Features

- Semantic similarity computation using **Sentence-BERT** embeddings  
- Harmfulness estimation using an **NLI-style sequence classification model**  
- Continuous jailbreak scoring metric (**Optimus**)  
- Compatible with **CPU and GPU** execution via PyTorch  
- Modular design enabling replacement of encoders or classifiers  

---

## Dependencies

The following libraries are required:

- Python **3.9 or higher**
- PyTorch
- HuggingFace Transformers
- Sentence-Transformers
- NumPy

### Installation

```bash
pip install torch transformers sentence-transformers numpy
