Metadata-Version: 2.4
Name: ares-redteamer
Version: 0.2.1
Summary: AI Robustness Evaluation System
Project-URL: Repository, https://github.com/IBM/ares.git
Project-URL: Issues, https://github.com/IBM/ares/issues
Author: Giandomenico Cornacchia
Author-email: Liubov Nedoshivina <liubov.nedoshivina@ibm.com>, Kieran Fraser <kieran.fraser@ibm.com>, Mark Purcell <mark.purcell@ie.ibm.com>, Ambrish Rawat <ambrish.rawat@ie.ibm.com>, Giulio Zizzo <giulio.zizzo2@ibm.com>, Stefano Braghin <stefanob@ie.ibm.com>, Anisa Halimi <anisa.halimi@ibm.com>, Naoise Holohan <naoise.holohan@ibm.com>
License-Expression: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Build Tools
Requires-Python: >=3.11
Requires-Dist: accelerate==1.12.0
Requires-Dist: click==8.3.1
Requires-Dist: datasets==4.6.1
Requires-Dist: ibm-watsonx-ai==1.5.3
Requires-Dist: markdown==3.10.2
Requires-Dist: nicegui==3.12.0
Requires-Dist: pandas==2.2.3
Requires-Dist: pip>=25.2
Requires-Dist: protobuf==6.33.6
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv==1.2.2
Requires-Dist: pyyaml==6.0.3
Requires-Dist: requests==2.33.1
Requires-Dist: seaborn>=0.13.2
Requires-Dist: sentence-transformers==5.2.3
Requires-Dist: sentencepiece~=0.2.1
Requires-Dist: streamlit>=1.51.0
Requires-Dist: tiktoken==0.12.0
Requires-Dist: torch==2.8.0
Requires-Dist: tqdm>=4.67
Requires-Dist: transformers>=4.56.2
Requires-Dist: typer==0.24.1
Provides-Extra: dev
Requires-Dist: bandit==1.9.4; extra == 'dev'
Requires-Dist: jupyter==1.1.1; extra == 'dev'
Requires-Dist: mypy-extensions==1.1.0; extra == 'dev'
Requires-Dist: mypy==1.19.1; extra == 'dev'
Requires-Dist: prek==0.3.13; extra == 'dev'
Requires-Dist: pytest-asyncio==1.3.0; extra == 'dev'
Requires-Dist: pytest-cov==7.0.0; extra == 'dev'
Requires-Dist: pytest-mock==3.15.1; extra == 'dev'
Requires-Dist: pytest==9.0.3; extra == 'dev'
Requires-Dist: requests-mock==1.12.1; extra == 'dev'
Requires-Dist: ruff==0.15.14; extra == 'dev'
Requires-Dist: scikit-learn==1.8.0; extra == 'dev'
Requires-Dist: sentencepiece==0.2.1; extra == 'dev'
Requires-Dist: sphinx-rtd-theme==3.0.2; extra == 'dev'
Requires-Dist: sphinx==8.2.3; extra == 'dev'
Description-Content-Type: text/markdown

[![Testing](https://github.com/IBM/ares/actions/workflows/testing.yml/badge.svg)](https://github.com/IBM/ares/actions/workflows/testing.yml)
[![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://ibm.github.io/ares/)

# AI Robustness Evaluation System (ARES)

**Stop wondering if your AI is secure. Know for certain.**

ARES automates LLM red-teaming so you can test your models against real attacks before deployment. Plug in your attacks, evaluators, and guardrails. Test across models. Get unified reports.

[Install ARES](#-quick-installation), run this [quickstart](example_configs/quickstart.yaml) example, and view results in chat format:
```bash
ares evaluate example_configs/quickstart.yaml -l
ares show-chat -f results/keyword_evaluation.json --open
```

```
┌───────────────────────────────────────────────────────────────────────────┐
│                         ARES Evaluation Flow                              │
└───────────────────────────────────────────────────────────────────────────┘

  📋 Define Goals          🎯 Select Strategy          📊 Evaluate Results
       │                          │                            │
       ▼                          ▼                            ▼
┌──────────────┐          ┌───────────────┐          ┌─────────────────┐
│ What to test │ ───────> │ How to attack │ ───────> │ How to measure  │
└──────────────┘          └───────────────┘          └─────────────────┘
 • PII leakage             • Prompt injection          • Keyword match
 • Data exfiltration       • Crescendo                 • LLM judges
 • Harmful content         • GCG, TAP, etc.            • Custom evals
 • Custom goals            • Your attack               • Guardrails
```

**What is ARES?** An orchestration framework that lets you plug in your own attacks, evaluators, and guardrails to test LLMs - whether you're benchmarking a new attack method for research or testing your model's security before deployment.

**Why ARES?**
* 🔬 **For Researchers**: Benchmark your novel attack against 20+ existing methods with one config
* 🛡️ **For Security Teams**: Test against OWASP top-10 vulnerabilities before production
* 🔌 **For Developers**: Integrate your custom attacks, detectors, guardrails, or evaluation methods

**Three core components you can customize:**
* **Goals**: What to test (PII leakage, prompt injection, jailbreaks, or your custom goals)
* **Strategy**: How to attack (built-in methods or your novel attack technique)
* **Evaluation**: How to measure (keyword matching, LLM judges, or your custom evaluator)

---

## 🗺️ Navigation & Quick Start

**Choose your learning path based on your experience level:**

| Experience Level | I want to... | Start Here |
|-----------------|--------------|------------|
| 🟢 **Beginner** | Try it visually (no coding) | [GUI Interface](#️-gui-optional) |
| 🟢 **Beginner** | Run my first security test | [Quickstart](#rocket-quickstart) |
| 🟢 **Beginner** | See real-world examples | [Real-World Examples](#-real-world-examples) |
| 🟡 **Intermediate** | Test with multiple attack methods | [Using Built-in Plugins](#-using-built-in-plugins) |
| 🟡 **Intermediate** | Test OWASP vulnerabilities | [OWASP Security Testing](#️-owasp-security-testing) |
| 🔴 **Advanced** | Create custom attacks/evaluators | [ADVANCED.md](ADVANCED.md#-creating-custom-plugins) |
| 🔴 **Advanced** | Fine-tune configuration | [ADVANCED.md](ADVANCED.md#️-advanced-configuration) |

**Quick Decision Tree:**
- 👉 **New to red-teaming?** Start with [GUI](#️-gui-optional) or [Quickstart](#rocket-quickstart)
- 👉 **Security professional?** Jump to [OWASP Testing](#️-owasp-security-testing)
- 👉 **Researcher?** Check [Using Plugins](#-using-built-in-plugins) then [ADVANCED.md](ADVANCED.md)
- 👉 **Just exploring?** Browse [Real-World Examples](#-real-world-examples)

**Full Documentation:** [ibm.github.io/ares](https://ibm.github.io/ares/)

---

## 🏗️ Architecture

The ARES programming model provides a flexible framework for orchestrating robustness evaluations:

![ARES Programming Model](docs/source/_static/ares-programming-model.png)

**Key Components:**
- **Plugin Catalog**: Extensible collection of target connectors, attack goals, strategies, and evaluations
- **Configuration-Driven**: Define your evaluation pipeline through YAML configuration
- **Programmatic API**: Full control through Python API (`redteamer.target()`, `redteamer.goal()`, `redteamer.strategy()`, `redteamer.evaluate()`)

---

## 🖥️ GUI (Optional)

**🟢 Complexity: Beginner** | No coding required

**Not a command-line person? No problem.** Test AI security with drag-and-drop simplicity - perfect for security teams who want quick results without writing code.

### Quick Start

1. **Clone the repository**:
   ```bash
   git clone https://github.com/IBM/ares.git
   cd ares
   ```

2. **Install ARES**:
   ```bash
   pip install .
   ```

3. **Launch the GUI**:
   ```bash
   python gui.py
   ```

4. **You'll see this interface**:

<p align="center">
  <img src="assets/images/gui_screen.jpg" 
  alt="Main GUI Screen" width="500"/>
</p>

### GUI Features

The interface has 5 tabs on the left:

- **📝 Configuration**: Upload and edit your test configuration
- **📊 Data**: Upload test prompts and view configured datasets
- **🔌 Plugins**: Browse and install available attack/evaluation plugins
- **🎯 Red Team**: Launch your configured security tests
- **📈 Reports**: View detailed results and vulnerability reports

### Example Workflow

**1. Upload Configuration**
<p align="center">
  <img src="assets/images/gui_config_upload.jpg" 
  alt="Config Upload" width="400"/>
</p>

**2. Install Required Plugins**
<p align="center">
  <img src="assets/images/gui_plugins.jpg" 
  alt="Plugin Installation" width="400"/>
</p>

**3. Run Tests & View Results**
<p align="center">
  <img src="assets/images/gui_redteam.jpg"
  alt="Test Results" width="400"/>
</p>

**4. Visualize Attack Conversations** (Optional)

ARES can visualize attacks as chat-style conversations with evaluation scores, making it easier to assess multi-turn attacks and understand how jailbreaks evolve. 
Just click `Show Chat View` from **Reports** tab.

<p align="center">
  <img src="assets/images/gui_show_chat.jpg"
  alt="Test Results" width="400"/>
</p>


> 💡 **Pro Tip:** The GUI is great for exploration, but the CLI gives you more control and is better for automation. Once you're comfortable, try the [CLI Installation](#-quick-installation) below.

---

## ⚡ Quick Installation

**🟢 Complexity: Beginner**

### Prerequisites

You'll need Python 3.11+ and either:
- **pip** (standard Python package manager)
- **[uv](https://docs.astral.sh/uv/)** (recommended - 10-100x faster): `curl -LsSf https://astral.sh/uv/install.sh | sh`

### One-Line Install

Set up a virtual environment first so your install stays clean and isolated:

```bash
# prepare a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# install ares
curl https://raw.githubusercontent.com/IBM/ares/refs/heads/main/install.sh | bash
```

This will create the [`example_configs/`](example_configs/) and [`assets/`](assets/) directories in your current directory with the files you need. Then run the quickstart and open the chat-style results:
```bash
ares evaluate example_configs/quickstart.yaml -l
ares show-chat -f results/keyword_evaluation.json --open
```

Or try the minimal example:
```bash
ares evaluate example_configs/minimal.yaml -l
```

> **⚠️ Important:** Using a virtual environment is highly recommended.

> **💡 Note:** See [Understanding ARES_HOME](ADVANCED.md#understanding-ares_home) for details on path resolution.

> **📦 Note:** More examples and assets can be loaded from the [ARES repository](https://github.com/IBM/ares).

### Development Installation

For interactive development and customization:

**Using pip:**
```bash
# 1. Clone the repository
git clone https://github.com/IBM/ares.git
cd ares

# 2. Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# 3. Install ARES with dev dependencies
pip install ".[dev]"

# 4. Run examples
ares evaluate example_configs/quickstart.yaml -l
ares show-chat -f results/keyword_evaluation.json --open
```

**Using uv (faster):**
```bash
# 1. Clone the repository
git clone https://github.com/IBM/ares.git
cd ares

# 2. Sync dependencies with dev extras (creates venv automatically)
uv sync --extra dev

# 3. Activate virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# 4. Run examples
ares evaluate example_configs/quickstart.yaml -l
ares show-chat -f results/keyword_evaluation.json --open
```

**What's next?** Run your [first test](#rocket-quickstart).

---

## :rocket: Quickstart

**🟢 Complexity: Beginner** | Your first security test

**Let's catch a vulnerability before your users do.** This quickstart tests a model against harmful behavior prompts - one of the most common security assessments.

### Option 1: Use the Pre-Built Config (Fastest)

```bash
ares evaluate example_configs/quickstart.yaml -l -n 10
```
**Flags explained:** `-l` limits number of goals to run (default 5), `-n 10` specifies exactly 10 goals to test

This uses our ready-to-go configuration that shows you all the components explicitly. [View the config](example_configs/quickstart.yaml) to see how it's structured.

### Option 2: Create Your Own Config (Learn by Doing)

Create a file called `my-first-test.yaml`:

```yaml
# my-first-test.yaml
target:
  huggingface:
    model_config:
      pretrained_model_name_or_path: Qwen/Qwen2-0.5B-Instruct
    tokenizer_config:
      pretrained_model_name_or_path: Qwen/Qwen2-0.5B-Instruct

red-teaming:
  prompts: assets/safety_behaviors_text_subset.csv  # Test harmful behavior prompts
```

Then run the test:

```bash
ares evaluate my-first-test.yaml -l -n 10
```
**Flags explained:** `-l` limits number of goals to run (default 5), `-n 10` specifies exactly 10 goals to test

### Understanding the Results

**What just happened?**
1. ✅ ARES loaded a small HuggingFace model (Qwen2-0.5B-Instruct)
2. ✅ Sent 5 test prompts designed to elicit harmful behaviors
3. ✅ Evaluated responses using keyword matching (checks for refusal patterns)
4. ✅ Generated a detailed report showing results

**Your saved results in the `results` folder will have:**
- A high level summary report with relevant statistics.
- Which prompts the model responded to
- Which prompts were refused
- Response patterns and safety behaviors
- Detailed conversation logs (for multi turn attacks)

> 💡 **Pro Tip:** The quickstart uses defaults for simplicity. Check [`example_configs/quickstart.yaml`](example_configs/quickstart.yaml) to see the full explicit configuration with all components (strategy, evaluation, goals) clearly defined.

### Next Steps

- 📊 **View the report**: Open the generated HTML file in your browser
- 📝 **See full config**: Check [`example_configs/quickstart.yaml`](example_configs/quickstart.yaml) to understand all components
- 📓 **Interactive learning**: Try the [Jupyter notebook](notebooks/Red%20Teaming%20with%20ARES.ipynb)
- 📁 **More examples**: Explore `example_configs/` directory
- 🎯 **Test your model**: Replace the default model with your own

**🎯 What's Next?** You've run your first test. Now see how real teams can use ARES to catch vulnerabilities before deployment → [Real-World Examples](#-real-world-examples)

---

## 🌍 Real-World Examples

**🟢 Complexity: Beginner** | See ARES in action

**Learn from real security testing scenarios.** These examples show how teams can use ARES to catch vulnerabilities before deployment.

### Example 1: Pre-Deployment Security Audit

**Scenario:** Test if your customer service chatbot leaks PII using multiple attack vectors.

**What you test:** Direct requests, crescendo and encoding attacks

**What you learn:** Which attacks extract PII, types of information leaked, target robustness

📋 [See full configuration & results](ADVANCED.md#example-1-pre-deployment-security-audit)

### Example 2: Testing Guardrail Effectiveness

**Scenario:** Measure how well Granite Guardian protects your model against various attacks.

**What you test:** Human Jailbreaks, encoding and crescendo attacks

**What you learn:** Which attacks the guardrail blocks, bypass techniques, effectiveness rates

📋 [See full configuration & results](ADVANCED.md#example-2-testing-guardrail-effectiveness)

### Example 3: Research Benchmarking

**Scenario:** Compare your novel attack against established methods for publication.

**What you test:** Your attack vs. 4 baselines with multiple evaluators

**What you learn:** Success rate comparisons, statistical significance, reproducible results

📋 [See full configuration & results](ADVANCED.md#example-3-research-benchmarking)

> 📓 **Try these interactive examples:**
> - [Red Teaming with ARES](notebooks/Red%20Teaming%20with%20ARES.ipynb) - Complete walkthrough
> - [Granite Guardian Testing](notebooks/Granite%20Guardian%203.3-8b%20with%20ARES.ipynb) - Guardrail effectiveness
> - [Multi-Agent Coalition Attacks](notebooks/Multi_Agent_Coalition_Attack_with_ARES.ipynb) - Advanced attack scenarios

**🎯 What's Next?** You've seen examples. Now discover how to combine multiple attack methods to find vulnerabilities others miss → [Using Built-in Plugins](#-using-built-in-plugins)

---

## 💡 What You Can Do

**🟡 Complexity: Intermediate** | Understanding ARES capabilities

**Now that you've seen ARES in action, here's everything you can do with it.**

### For Researchers

- 🔬 **Benchmark novel attacks**: Plug in your attack method and compare against 20+ existing techniques
- 📊 **Multi-model testing**: Test across local models and cloud APIs with one config
- 📈 **Unified metrics**: Get comparative analysis with standardized evaluation
- 📝 **Reproducible research**: Share configs for reproducible experiments

### For Security Teams

- 🛡️ **OWASP compliance**: Test against [OWASP top-10 LLM vulnerabilities](https://genai.owasp.org/llm-top-10/)
- 🔍 **Pre-deployment testing**: Catch vulnerabilities before production
- 📋 **Audit reports**: Generate detailed security assessment reports
- 🎯 **Custom test scenarios**: Define organization-specific security tests

### For Developers

- 🔌 **Guardrail integration**: Add your custom safety filters and test effectiveness
- 🎯 **Custom evaluators**: Use your own detection methods (keywords, ML models, LLM judges)
- 🔄 **CI/CD integration**: Automate security testing in your pipeline
- 📊 **Performance tracking**: Monitor security improvements over time

### Built-in Capabilities

- ✅ **Single & multi-turn attacks**: One-shot prompts and conversational strategies
- ✅ **19 ready-to-use plugins**: Garak, PyRIT, AutoDAN, CyberSecEval, and more
- ✅ **Interactive dashboard**: Explore results visually
- ✅ **One YAML config**: Orchestrate everything from a single file

**🎯 What's Next?** Ready to test with multiple attack methods simultaneously? → [Using Built-in Plugins](#-using-built-in-plugins)

---

## 🔌 Using Built-in Plugins

**🟡 Complexity: Intermediate** | Testing with multiple attack methods

**One config. 15+ attack methods. Find the weakest link.** This section shows you how to combine multiple plugins for comprehensive security testing.

### Understanding Plugin Types

Before diving into examples, here's what each plugin type does:

- **🎯 Goals**: Define what to test (e.g., "extract PII", "generate harmful content")
- **⚔️ Strategies**: Attack methods (e.g., jailbreaks, encoding, multi-turn conversations)
- **📊 Evaluators**: How to measure success (e.g., keyword matching, LLM judges)
- **🔌 Connectors**: How to connect to models (HuggingFace, OpenAI, WatsonX, etc.)
- **🛡️ Guardrails**: Safety filters to test (input/output filters)

### Plugin Installation

There are two ways to install plugins. The names of the available plugins are all under: `ares/plugins`.

1. First we can use the ares cli with the name of the plugin, for this example we will use `ares-human-jailbreak`:

`ares install-plugin ares-human-jailbreak`

2. Or, for manual installation, we can navigate to the folder with the plugin, in this example `ares-litellm`

`
cd plugins/ares-litellm
`

and then run

`
pip install .
`

to install the relevant plugin.
### Example 1: Single Attack Method

**requires `ares-human-jailbreak` plugin**

install via: `ares install-plugin ares-human-jailbreak`

**Start simple** - test one attack method against your model:
- Use known jailbreak prompts
- Check responses for harmful content patterns
- Get clear pass/fail results

📋 [See configuration](ADVANCED.md#single-attack-method)

### Example 2: Multiple Attack Methods

**Compare strategies** - test multiple attacks simultaneously:
- 3 different attack methods (crescendo, human jailbreaks, encoding)
- 2 evaluation methods (keyword matching, LLM judge)
- One unified report showing which attacks work best

📋 [See configuration](ADVANCED.md#multiple-attack-methods)


### 🎯 Which Plugin Should I Use?

**Choose based on your testing goal:**

| Your Goal | Recommended Plugins | Why |
|-----------|-------------------|-----|
| Test jailbreak resistance | `human_jailbreak`, `crescendo` | Known effective jailbreaks + multi-turn attacks |
| Test data leakage | `direct_requests` + `inject_base64` + `keyword` | Direct extraction attempts with and without encoding + pattern detection |
| Test encoding bypasses | `encoding` (base64, ROT13, etc.) | Common obfuscation techniques |
| Benchmark novel attack | Create custom plugin | Compare against baselines |
| Test guardrail effectiveness | Any strategy + your guardrail | See what gets through |

### 📦 Available Built-in Plugins

<details>
<summary><b>🔽 Click to see all 19 public plugins</b></summary>

**Core Strategies (Built-in):**
- `direct_requests` - Simple harmful prompts
- `multi_turn` - Multi-turn conversation attacks (implement your, but make it compatible to ARES pipeline)

**Plugin Attack Strategies:**
- [`ares-echo-chamber`](plugins/ares-echo-chamber) - Multi-turn attack
- [`ares-gcg`](plugins/ares-gcg) - Greedy Coordinate Gradient attacks
- [`ares-tap`](plugins/ares-tap) - Tree of Attacks with Pruning
- [`ares-human-jailbreak`](plugins/ares-human-jailbreak) - Known jailbreak prompts from research
- [`ares-autodan`](plugins/ares-autodan) - Automated jailbreak generation
- [`ares-garak`](plugins/ares-garak) - Garak vulnerability scanner integration
- [`ares-pyrit`](plugins/ares-pyrit) - PyRIT attack framework integration
- [`ares-dynamic-llm`](plugins/ares-dynamic-llm) - LLM-generated adaptive attacks

**Core Evaluators (Built-in):**
- `keyword` - Pattern matching for harmful content
- `llm_eval` - LLM-as-judge scoring
- `huggingface_eval` - HuggingFace model-based evaluation

**Plugin Evaluators:**
- [`ares-cyberseceval`](plugins/ares-cyberseceval) - Security-specific evaluations & goals
- [`ares-intrinsics`](plugins/ares-intrinsics) - Intrinsic evaluation

**Core Connectors (Built-in):**
- `huggingface` - Local HuggingFace models
- `watsonx` - IBM WatsonX models
- `restful` - Generic REST API connector

**Plugin Connectors:**
- [`ares-litellm`](plugins/ares-litellm) - Universal LLM proxy (OpenAI, Anthropic, etc.)
- [`ares-granite-io`](plugins/ares-granite-io) - IBM Granite models via Ollama
- [`ares-vllm-connector`](plugins/ares-vllm-connector) - vLLM inference server
- [`ares-watsonx-orchestrate`](plugins/ares-watsonx-orchestrate) - WatsonX Orchestrate agents
- [`ares-lora-adapter-connector`](plugins/ares-lora-adapter-connector) - LoRA adapter support
- [`ares-mcp-connector`](plugins/ares-mcp-connector) - Model Context Protocol connector
- [`ares-icarus-connector`](plugins/ares-icarus-connector) - Icarus platform integration

**Goal Plugins:**
- [`ares-cyberseceval`](plugins/ares-cyberseceval) - CyberSecEval security test goals
- [`ares-deepteam`](plugins/ares-deepteam) - Deep team-based goals generation

</details>

> 📖 [Full Plugin Documentation](https://ibm.github.io/ares/plugins.html) | 💡 [More Config Examples](example_configs/plugins/)

**🎯 What's Next?** Test against industry-standard vulnerabilities that matter to stakeholders → [OWASP Security Testing](#️-owasp-security-testing)

---

## 🛡️ OWASP Security Testing

**🟡 Complexity: Intermediate** | Industry-standard vulnerability testing

**Is your AI vulnerable to the top 10 security risks?** ARES maps directly to the [OWASP Top 10 for LLM Applications](https://genai.owasp.org/llm-top-10/), making it easy to test for industry-recognized vulnerabilities.

### Why OWASP Matters

**The OWASP Top 10 represents the most critical security risks for LLM applications**, identified by security experts worldwide. Testing against these vulnerabilities helps you:

- ✅ Meet security compliance requirements
- ✅ Identify critical risks before deployment
- ✅ Communicate security posture to stakeholders
- ✅ Prioritize security improvements

> ⚠️ **Real Impact:** Companies have found critical vulnerabilities (PII leakage, prompt injection) in production systems using OWASP testing. Don't wait for users to find them first.

### Quick OWASP Test

Test your model against a specific OWASP vulnerability. Each intent must be tested separately:

```yaml
# owasp-llm-01-test.yaml
target:
  huggingface:
    model_config:
      pretrained_model_name_or_path: Qwen/Qwen2-0.5B-Instruct
    tokenizer_config:
      pretrained_model_name_or_path: Qwen/Qwen2-0.5B-Instruct

red-teaming:
  intent: owasp-llm-01:2025  # Prompt Injection
  prompts: assets/safety_behaviors_text_subset.csv
```

**To test multiple OWASP categories:** Run separate tests for each intent (owasp-llm-01:2025, owasp-llm-02:2025, etc.)

### 🎯 Top 3 Critical Vulnerabilities to Test First

Start with these high-impact vulnerabilities:

1. **LLM01: Prompt Injection** - Can attackers override your system instructions?
   - Intent: `owasp-llm-01:2025`
   - [Example Notebook](notebooks/owasp/OWASP-LLM-01-2025_with_ARES.ipynb)

2. **LLM02: Sensitive Information Disclosure** - Does your model leak secrets?
   - Intent: `owasp-llm-02:2025`
   - [Contact us](mailto:ares@ibm.com) for examples

3. **LLM09: Misinformation** - Can attackers make your model hallucinate?
   - Intent: `owasp-llm-09:2025`
   - [Example Notebook](notebooks/owasp/OWASP-LLM-09-2025_with_ARES.ipynb)

### OWASP Mapping Table

<details>
<summary>📜 <b>Complete OWASP to ARES Mapping (Click to expand)</b></summary>

| Code | Title | What It Tests | ARES Intent | Status | Example |
| --- | --- | --- | --- | --- | --- |
| LLM01 | Prompt Injection​ | Can prompts override intended behavior or security policies? | `owasp-llm-01:2025` | ✅ Supported | [Notebook](notebooks/owasp/OWASP-LLM-01-2025_with_ARES.ipynb) |
| LLM02 | Sensitive Information Disclosure​ | Does the system leak secrets (API keys, PII) through responses? | `owasp-llm-02:2025` | ✅ Supported | [Contact us](mailto:ares@ibm.com) |
| LLM03 | Supply Chain​ | Are dependencies and model artifacts validated for integrity? | `owasp-llm-03:2025` | ⚠️ Not supported | - |
| LLM04 | Data and Model Poisoning​ | Can external inputs corrupt training data or retrieval (RAG)? | `owasp-llm-04:2025` | ✅ Supported | WIP |
| LLM05 | Improper Output Handling​ | Are outputs unsafe (injected prompts, broken deps, malformed code)? | `owasp-llm-05:2025` | ✅ Supported | WIP |
| LLM06 | Excessive Agency | Can the agent use tools beyond intended scope or be hijacked? | `owasp-llm-06:2025` | ✅ Supported | WIP |
| LLM07 | System Prompt Leakage | Are system-level instructions or sensitive context exposed? | `owasp-llm-07:2025` | ✅ Supported | WIP |
| LLM08 | Vector and Embedding Weaknesses | Is sensitive data leaked via embeddings or retrieval vectors? | `owasp-llm-08:2025` | ⚠️ See LLM02 | - |
| LLM09 | Misinformation​ | Is the model resilient against hallucinations or malicious content? | `owasp-llm-09:2025` | ✅ Supported | [Notebook](notebooks/owasp/OWASP-LLM-09-2025_with_ARES.ipynb) |
| LLM10 | Unbounded Consumption​ | Does the agent prevent resource exhaustion (DoS attacks)? | `owasp-llm-10:2025` | ✅ Supported | WIP |

</details>

> 📖 [OWASP Testing Guide](https://ibm.github.io/ares/owasp_mapping.html) | 📓 [Example Notebooks](notebooks/owasp/)

**🎯 What's Next?** Ready to extend ARES with your own tools? Explore advanced customization → [ADVANCED.md](ADVANCED.md)

---

## 🔧 Advanced Topics

**Ready to extend ARES?** Check out our [Advanced Guide](ADVANCED.md) for:

- 🔌 **[Creating Custom Plugins](ADVANCED.md#-creating-custom-plugins)** - Build your own attack strategies, evaluators, and connectors
- ⚙️ **[Advanced Configuration](ADVANCED.md#️-advanced-configuration)** - Fine-tune ARES behavior and model settings
- 📚 **[Plugin Development Resources](ADVANCED.md#-plugin-development-resources)** - Templates, examples, and guides

**Quick links:**
- [Plugin Template](plugins/new-plugin-template) - Copy-paste starting point
- [Plugin Examples](example_configs/plugins/) - Real-world configurations
- [Full Documentation](https://ibm.github.io/ares/plugins.html) - Detailed guides

---

## 🤝 Community & Support

### Get Help

- 📖 [Documentation](https://ibm.github.io/ares/) - Comprehensive guides
- 💬 [GitHub Discussions](https://github.com/IBM/ares/discussions) - Ask questions
- 🐛 [Issue Tracker](https://github.com/IBM/ares/issues) - Report bugs
- 📧 [Email](mailto:ares@ibm.com) - Direct support

### Contribute

We welcome contributions! Here's how to get started:

1. **Report Issues**: Found a bug? [Open an issue](https://github.com/IBM/ares/issues)
2. **Share Plugins**: Created a useful plugin? Submit a PR
3. **Improve Docs**: Help us make documentation better
4. **Share Examples**: Add your use cases to inspire others

### Stay Updated

- ⭐ [Star the repo](https://github.com/IBM/ares) to stay notified
- 📣 Follow releases for new features
- 🎓 Check out new example notebooks

### Feedback Welcome

📣 **Try ARES and share your feedback!** We're constantly improving based on user input.

---

## 📚 Additional Resources

### Example Configurations

The `example_configs/` directory contains ready-to-use configurations:

- **Basic Examples**: `minimal.yaml`, `strategies.yaml`, `evaluators.yaml`
- **OWASP Tests**: `owasp/` directory
- **Plugin Examples**: `plugins/` directory with 15+ plugin configs
- **Custom Scenarios**: `custom/` directory with advanced use cases

### Jupyter Notebooks

Interactive tutorials in the `notebooks/` directory:

- [Red Teaming with ARES](notebooks/Red%20Teaming%20with%20ARES.ipynb) - Complete walkthrough
- [OWASP Testing](notebooks/owasp/) - Vulnerability-specific guides
- [Plugin Development](notebooks/plugins/) - Create your own plugins
- [Multi-Agent Attacks](notebooks/Multi_Agent_Coalition_Attack_with_ARES.ipynb) - Advanced scenarios

### Research Papers

ARES is built on cutting-edge research:

- [Crescendo Attack](https://arxiv.org/abs/2404.01833) - Multi-turn jailbreaking
- [GCG Attack](https://arxiv.org/abs/2307.15043) - Gradient-based adversarial suffixes
- [TAP Attack](https://arxiv.org/abs/2312.02119) - Tree of attacks with pruning

---

## IBM ❤️ Open Source AI

ARES has been brought to you by IBM Research. We believe in open, transparent, and secure AI development.

**License:** Apache 2.0

**Citation:**
```bibtex

@software{ares2025,
  title={ARES: AI Robustness Evaluation System},
  author={Liubov Nedoshivina and
             Kieran Fraser and
             Mark Purcell and
             Ambrish Rawat and
             Giulio Zizzo and
             Muhammad Zaid Hameed and
             Stefano Braghin and
             Anisa Halimi and
             Cristian Morasso and
             Ibrahim Malik and
             Naoise Holohan and
             Giandomenico Cornacchia},
  organization={IBM Research},
  year={2025},
  url={https://github.com/IBM/ares}
}
