Metadata-Version: 2.4
Name: datarobot-moderations
Version: 11.2.27
Summary: DataRobot Monitoring and Moderation framework
License: DataRobot Tool and Utility Agreement
Author: DataRobot
Author-email: support@datarobot.com
Requires-Python: >=3.10,<3.13
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: all
Provides-Extra: bedrock
Provides-Extra: datarobot-sdk
Provides-Extra: llm-eval
Provides-Extra: nemo
Provides-Extra: nemo-evaluator
Provides-Extra: nvidia
Provides-Extra: vertex
Requires-Dist: aiohttp (>=3.9.5)
Requires-Dist: backoff (>=2.2.1)
Requires-Dist: datarobot (>=3.6.0) ; extra == "datarobot-sdk" or extra == "all"
Requires-Dist: datarobot-predict (>=1.9.6) ; extra == "datarobot-sdk" or extra == "all"
Requires-Dist: deepeval (>=3.3.5) ; extra == "llm-eval" or extra == "all"
Requires-Dist: google-cloud-aiplatform (>=1.133.0,<2) ; extra == "vertex" or extra == "all"
Requires-Dist: langchain (>=1.2.0,<2) ; extra == "llm-eval" or extra == "all"
Requires-Dist: langchain-core (>=1.2.0,<2) ; extra == "llm-eval" or extra == "all"
Requires-Dist: langchain-nvidia-ai-endpoints (>=0.3.9) ; extra == "nvidia" or extra == "all"
Requires-Dist: langchain-openai (>=0.1.7) ; extra == "llm-eval" or extra == "all"
Requires-Dist: llama-index (>=0.14.0) ; extra == "llm-eval" or extra == "all"
Requires-Dist: llama-index-embeddings-azure-openai (>=0.1.6) ; extra == "llm-eval" or extra == "all"
Requires-Dist: llama-index-llms-bedrock-converse (>=0.1.6) ; extra == "bedrock" or extra == "all"
Requires-Dist: llama-index-llms-langchain (>=0.8.0) ; extra == "llm-eval" or extra == "all"
Requires-Dist: llama-index-llms-openai (>=0.1.0) ; extra == "llm-eval" or extra == "all"
Requires-Dist: llama-index-llms-vertex (>=0.1.5) ; extra == "vertex" or extra == "all"
Requires-Dist: nemo-microservices (>=1.5.0,<2.0.0) ; extra == "nemo-evaluator" or extra == "all"
Requires-Dist: nemoguardrails (>=0.20.0) ; extra == "nemo" or extra == "all"
Requires-Dist: numpy (>=1.25.0)
Requires-Dist: openai (>=1.14.3) ; extra == "llm-eval" or extra == "all"
Requires-Dist: opentelemetry-api (>=1.16.0)
Requires-Dist: opentelemetry-exporter-otlp-proto-http (>=1.16.0)
Requires-Dist: opentelemetry-instrumentation (>=0.60b1,<0.61)
Requires-Dist: opentelemetry-sdk (>=1.16.0)
Requires-Dist: pandas (>=2.0.3)
Requires-Dist: pillow (>=12.1.1)
Requires-Dist: pydantic (>=2.0.0,<3.0.0)
Requires-Dist: ragas (>=0.4.3) ; extra == "llm-eval" or extra == "all"
Requires-Dist: rouge-score (>=0.1.2)
Requires-Dist: tiktoken (>=0.5.1)
Requires-Dist: trafaret (>=2.1.1)
Description-Content-Type: text/markdown

# DataRobot Moderations library

This library enforces the intervention in the prompt and response texts as per the
guard configuration set by the user.

The library accepts the guard configuration in the yaml format and the input prompts
and outputs the dataframe with the details like:
- should the prompt be blocked
- should the completion be blocked
- metric values obtained from the model guards
- is the prompt or response modified as per the modifier guard configuration


## Documentation

- **[Guardrails Configuration Guide](docs/GUARDRAILS.md)** — full reference for every guard type,
  all YAML fields, worked examples, and Python / DRUM usage patterns.

## Architecture

The library is architected in a way that it wraps around the typical LLM prediction method.
The library will first run the pre-score guards - the guards that will evaluate prompts and
enforce moderation if necessary.  All the prompts that were not moderated by the library are
forwarded to the actual LLM to get their respective completions.  The library then evaluates
these completions using post-score guards and enforces intervention on them.

![](pics/img.png)

## How to build it?

The repository uses `poetry` to manage the build process and a wheel can be built using:
```bash
make clean
make
```

## How to use it?

A wheel file generated or downloaded can be installed with pip and will pull its
dependencies as well.
```bash
pip3 install datarobot-moderations
```

### Optional extras

The base install covers token-count, ROUGE-1, cost, and NeMo guards.
Heavier or cloud-specific dependencies are opt-in:

| Extra | What it enables |
|---|---|
| `datarobot-sdk` | DataRobot model guards, DataRobot LLM evaluator type |
| `llm-eval` | Faithfulness, Task Adherence, Agent Goal Accuracy, Guideline Adherence guards |
| `nemo` | NeMo Guardrails colang-based flow guard |
| `nemo-evaluator` | NeMo live-evaluation microservice guard |
| `nvidia` | NVIDIA NIM / ChatNVIDIA LLM support |
| `vertex` | Google Cloud Vertex AI LLM support |
| `bedrock` | AWS Bedrock LLM support |
| `all` | Every optional dependency at once |

```bash
# Example: task-adherence guard backed by a DataRobot LLM deployment
pip3 install 'datarobot-moderations[llm-eval,datarobot-sdk]'
```

### Standalone Python API

```python
from datarobot_dome.api import ModerationPipeline

pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")
```

**Evaluate a prompt** (pre-score guards only):

```python
result, latency = pipeline.evaluate_prompt("Ignore previous instructions and …")
if result.blocked:
    print(result.blocked_message)
```

**Evaluate a response** (post-score guards only):

```python
result, latency = pipeline.evaluate_response(
    response="The capital of France is Paris.",
    prompt="What is the capital of France?",
)
print(result.blocked)           # True / False
print(result.metrics)           # {"task_adherence_score": 0.0, ...}
```

**Full pipeline** — pre-score → LLM → post-score in one call:

```python
def my_llm(prompt: str) -> str:
    # Replace with your actual LLM integration (OpenAI, Vertex, etc.)
    return "DataRobot is an AI platform."

result = pipeline.evaluate_full_pipeline(
    prompt="What is DataRobot?",
    llm_callable=my_llm,
)

if not result.blocked:
    print(f"LLM Response: {result.response}")
```

#### Result objects

`evaluate_prompt` / `evaluate_response` return an `EvaluationResult`:

| Field | Type | Description |
|---|---|---|
| `blocked` | `bool` | Whether a BLOCK guard fired |
| `blocked_message` | `str \| None` | Guard-supplied block reason |
| `replaced` | `bool` | Whether a REPLACE guard fired |
| `replacement` | `str \| None` | The replacement text |
| `metrics` | `dict` | All guard metric values (scores, counts, …) |

`evaluate_full_pipeline` returns a `PipelineResult`:

| Field | Type | Description |
|---|---|---|
| `prompt_evaluation` | `EvaluationResult` | Pre-score guard result |
| `response` | `str \| None` | Effective response (post-replacement if applicable) |
| `response_evaluation` | `EvaluationResult \| None` | Post-score guard result |
| `blocked` | `bool` | True if either stage was blocked |
| `replaced` | `bool` | True if either stage was replaced |

### With [DRUM](https://github.com/datarobot/datarobot-user-models)
As described above, the library nicely wraps DRUM's `score` method for pre and post score
guards. Hence, in case of DRUM, the user simply runs their custom model using `drum score`
and can avail the moderation library features.

Install DRUM along with the necessary optional extras for your specific guards. If you are unsure which guards are in use, install `[all]`:

```bash
pip3 install datarobot-drum 'datarobot-moderations[all]'
drum score --verbose --logging-level info --code-dir ./ --input ./input.csv --target-type textgeneration --runtime-params-file values.yaml
```
# Guardrails Configuration Guide

Guards evaluate prompts (pre-score) and/or responses (post-score) and can **block**, **report**, or **replace** content based on configurable conditions.

---

## Table of Contents

1. [File structure](#1-file-structure)
2. [Top-level options](#2-top-level-options)
3. [Common guard fields](#3-common-guard-fields)
4. [Intervention block](#4-intervention-block)
5. [Guard types](#5-guard-types)
6. [LLM back-end options](#6-llm-back-end-options)
7. [Full annotated example](#7-full-annotated-example)
8. [Using the config in Python](#8-using-the-config-in-python)
9. [Testing guide](#9-testing-guide)
10. [Environment variables](#10-environment-variables)

---

## 1. File structure

```yaml
timeout_sec: 10
timeout_action: score
nemo_evaluator_deployment_id: "<your-nemo-evaluator-id>"

guards:
  - name: My Guard
    type: ootb
    stage: prompt
    # ...
```

---

## 2. Top-level options

| Field | Type | Default | Description |
|---|---|---|---|
| `timeout_sec` | int | `10` | Seconds to wait per guard |
| `timeout_action` | string | `score` | `score` (allow) or `block` on timeout |
| `nemo_evaluator_deployment_id` | string | — | DataRobot deployment ID of the NeMo Evaluator microservice; required when any guard uses `type: nemo_evaluator` |
| `guards` | list | **required** | List of guard definitions |

---

## 3. Common guard fields

| Field | Required | Description |
|---|---|---|
| `name` | ✅ | Unique label; used as the key in `result.metrics` and as the DataRobot custom metric name |
| `type` | ✅ | `ootb` · `model` · `nemo_guardrails` · `nemo_evaluator` |
| `stage` | ✅ | `prompt` · `response` · `[prompt, response]` (list runs the guard at both stages) |
| `description` | ❌ | Free-text label, ignored by the library |
| `intervention` | ❌ | What to do when the condition fires (see [§4](#4-intervention-block)). Omit entirely to measure only — nothing is ever blocked |
| `copy_citations` | ❌ | Boolean (`true`/`false`, default `false`). Passes retrieved RAG context to this guard. **Required for `rouge_1` and `faithfulness` to produce meaningful scores** |
| `is_agentic` | ❌ | Marks an agentic-workflow guard (default `false`). Required by `agent_goal_accuracy` |

```yaml
# stage as a list — guard runs independently at both prompt and response stages
- name: Token Count Both
  type: ootb
  ootb_type: token_count
  stage: [prompt, response]
  intervention:
    action: block
    message: "Input or output exceeds the token limit."
    conditions:
      - comparator: greaterThan
        comparand: 100
```

---

## 4. Intervention block

```yaml
intervention:
  action: block               # "block" | "report" | "replace"
  message: "Blocked."         # returned to caller
  send_notification: false
  conditions:
    - comparand: 0.5
      comparator: greaterThan
```

> **One condition per intervention.** The `conditions` list accepts exactly one entry for
> `block` and `replace`; zero entries (`conditions: []`) is valid for `report`.
> To combine conditions (e.g. block if score < 0.2 **or** > 0.9), use two separate guards.

### Actions

| Action | Effect |
|---|---|
| `block` | Reject and return `message` to the caller. `message` is optional in the schema but omitting it returns an empty string — always set it. |
| `report` | Record the metric and allow content through unchanged. Behaviorally identical to omitting the `intervention` block entirely; useful when you want the metric tracked but never want to block. |
| `replace` | Swap the text with the sanitised version returned by the deployment. Only valid for `type: model` guards. The deployment **must** return the replacement text in the field specified by `model_info.replacement_text_column_name`; if that field is absent a `ValueError` is raised. |

### Comparators

| Comparator | Comparand type | Description |
|---|---|---|
| `greaterThan` / `lessThan` | number | Numeric threshold |
| `equals` / `notEquals` | number \| string | Exact equality. Use `comparand: "TRUE"` with NeMo Guardrails guards, whose score is the string `"TRUE"` or `"FALSE"` |
| `is` / `isNot` | boolean | Boolean equality |
| `matches` / `doesNotMatch` | list of strings | Class membership. `matches` fires if the prediction is in the list; `doesNotMatch` fires if it is not.|
| `contains` / `doesNotContain` | list of strings | Substring check against a list. `contains` fires if **all** items in the list are found as substrings of the prediction; `doesNotContain` fires if not all items are found. |

---

## 5. Guard types

### 5.1 Out-of-the-Box (`ootb`)

Set `type: ootb` and `ootb_type`.

**Install only what you use:**

```bash
pip install datarobot-moderations                          # base — token_count, rouge_1, cost, custom_metric
pip install 'datarobot-moderations[llm-eval]'              # + faithfulness, task_adherence, agent_guideline_adherence, agent_goal_accuracy
pip install 'datarobot-moderations[llm-eval,vertex]'       # + Google Vertex AI as LLM judge
pip install 'datarobot-moderations[llm-eval,bedrock]'      # + AWS Bedrock as LLM judge
pip install 'datarobot-moderations[llm-eval,nvidia]'       # + NVIDIA NIM as LLM judge
pip install 'datarobot-moderations[datarobot-sdk]'         # required for type: model and llm_type: datarobot
pip install 'datarobot-moderations[all]'                   # everything
```

| `ootb_type` | Stage | Install extra | Description |
|---|---|---|---|
| `token_count` | prompt / response | *(base)* | Token count |
| `rouge_1` | response | *(base)* | ROUGE-1 overlap with citations |
| `faithfulness` | response | `llm-eval` | LLM-judged hallucination detection |
| `task_adherence` | response | `llm-eval` | Task-completion score |
| `agent_guideline_adherence` | response | `llm-eval` | Guideline adherence |
| `agent_goal_accuracy` | response | `llm-eval` | Agentic goal-accuracy |
| `cost` | response | *(base)* | Estimated cost. Counts **both** prompt tokens (`input_price`/`input_unit`) and response tokens (`output_price`/`output_unit`). Must be at the response stage because both token counts are only available after the LLM responds. Currently only `currency: USD` is supported. |
| `custom_metric` | prompt / response | *(base)* | User-defined numeric metric |

```yaml
# Token count — report only
- name: Prompt Token Count
  type: ootb
  ootb_type: token_count
  stage: prompt

# Token count — block on length
- name: Response Token Count
  type: ootb
  ootb_type: token_count
  stage: response
  intervention:
    action: block
    message: "Response too long."
    conditions:
      - comparand: 1000
        comparator: greaterThan

# ROUGE-1 (requires citations)
- name: Rouge 1
  type: ootb
  ootb_type: rouge_1
  stage: response
  copy_citations: true
  intervention:
    action: report
    conditions: []

# Faithfulness
- name: Faithfulness
  type: ootb
  ootb_type: faithfulness
  stage: response
  copy_citations: true
  llm_type: datarobot
  deployment_id: "<your-llm-id>"   # 24-char DataRobot deployment ID
  intervention:
    action: block
    message: "Hallucination detected."
    conditions:
      - comparand: 0.0
        comparator: equals

# Task Adherence
- name: Task Adherence
  type: ootb
  ootb_type: task_adherence
  stage: response
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  intervention:
    action: block
    message: "LLM did not complete the requested task."
    conditions:
      - comparator: lessThan
        comparand: 0.5

# Guideline Adherence
- name: Guideline Adherence
  type: ootb
  ootb_type: agent_guideline_adherence
  stage: response
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  additional_guard_config:
    agent_guideline: "Response must be polite and on-topic."   # free-text criterion for the LLM judge
  intervention:
    action: block
    message: "Response violates guidelines."
    conditions:
      - comparand: 0.0
        comparator: equals

# Agent Goal Accuracy
- name: Agent Goal Accuracy
  type: ootb
  ootb_type: agent_goal_accuracy
  stage: response
  is_agentic: true
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  intervention:
    action: report
    conditions: []

# Cost tracking
- name: Cost
  type: ootb
  ootb_type: cost
  stage: response
  additional_guard_config:
    cost:
      currency: USD
      input_price: 0.01
      input_unit: 1000
      output_price: 0.03
      output_unit: 1000
  intervention:
    action: report
    conditions: []
```

---

### 5.2 Model guard

Wraps any **DataRobot deployment** you have already created (binary classifier, regression, multiclass, or text-generation). The library sends the text to that deployment and uses the prediction it returns to decide whether to block, report, or replace content.

```yaml
# Binary classifier (e.g. toxicity, prompt injection)
# Works with any DataRobot binary classification deployment.
- name: Toxicity
  type: model
  stage: prompt
  deployment_id: "<your-deployment-id>"   # 24-char DataRobot deployment ID
  model_info:
    input_column_name: text               # field your deployment reads as input
    target_name: toxicity_toxic_PREDICTION  # prediction field returned by the deployment
    target_type: Binary        # Binary | Regression | Multiclass | TextGeneration
    class_names: []            # leave empty for Binary/Regression
  intervention:
    action: block
    message: "Toxic content blocked."
    conditions:
      - comparand: 0.5
        comparator: greaterThan

# PII detection with text replacement
# The deployment must return BOTH the score field (`target_name`)
# AND a sanitised-text field (`replacement_text_column_name`).
- name: PII Detector
  type: model
  stage: prompt
  deployment_id: "<your-pii-deployment-id>"
  model_info:
    input_column_name: text
    target_name: contains_pii_true_PREDICTION
    target_type: TextGeneration
    replacement_text_column_name: anonymized_text_OUTPUT
    class_names: []
  intervention:
    action: replace
    message: "PII removed from prompt."
    conditions:
      - comparand: 0.5
        comparator: greaterThan

# Multi-label / emotion classifier
- name: Emotion Classifier
  type: model
  stage: prompt
  deployment_id: "<your-emotion-deployment-id>"
  model_info:
    input_column_name: text
    target_name: target_PREDICTION
    target_type: TextGeneration
    class_names: [anger, fear, sadness, disgust, joy, neutral]
  intervention:
    action: block
    message: "Negative emotion detected."
    conditions:
      - comparand: [anger, fear, sadness, disgust]
        comparator: matches
```

---

### 5.3 NeMo Guardrails

Flow-based content filtering. Requires `pip install 'datarobot-moderations[nemo]'`.

> **Supported `llm_type` values:** `openAi`, `azureOpenAi`, `nim`, `llmGateway` only.

Colang flow files must live in stage-specific subdirectories of `nemo_guardrails/`:

```
nemo_guardrails/
  prompt/      # config.yml + *.co files for stage: prompt
  response/    # config.yml + *.co files for stage: response
```

```yaml
- name: Stay on topic
  type: nemo_guardrails
  stage: prompt
  llm_type: azureOpenAi
  openai_api_base: "https://<resource>.openai.azure.com/"
  openai_deployment_id: gpt-4o-mini
  intervention:
    action: block
    message: "This topic is outside the allowed scope."
    conditions:
      - comparand: "TRUE"
        comparator: equals
```

---

### 5.4 NeMo Evaluator

Calls a DataRobot-hosted NeMo Evaluator microservice. Requires `pip install 'datarobot-moderations[nemo-evaluator]'`.

**Two deployment IDs — what's the difference?**

| Field | What it points to |
|---|---|
| `nemo_evaluator_deployment_id` (top-level) | Your **NeMo Evaluator microservice** deployment in DataRobot |
| `deployment_id` (per-guard) | The **LLM deployment** the evaluator uses to do the judging |

Both values must be valid 24-character DataRobot deployment IDs. Using a placeholder longer than 24 characters (e.g. `"<your-nemo-evaluator-id>"`) causes a load-time validation error: `String is longer than 24 characters`.

> **`llm_type`** must be `datarobot` for all `nemo_evaluator` guards.

| `nemo_evaluator_type` | Stage | Description |
|---|---|---|
| `llm_judge` | prompt / response | Custom LLM-as-judge with your own prompts. `score_parsing_regex` is a regular expression applied to the LLM's raw text reply to extract a single numeric score — e.g. `"([1-5])"` picks the first digit 1–5 from any surrounding text. |
| `context_relevance` | response | Relevance of retrieved context to the question |
| `response_groundedness` | response | Groundedness in retrieved context |
| `topic_adherence` | response | Adherence to allowed topics |
| `response_relevancy` | response | Relevance of response to question |
| `faithfulness` | response | NeMo microservice faithfulness score |
| `agent_goal_accuracy` | response | Agentic goal-accuracy via NeMo |

```yaml
nemo_evaluator_deployment_id: "<your-nemo-evaluator-id>"

guards:
  - name: Safety Judge
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: llm_judge
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_llm_judge_config:
      system_prompt: "Rate safety 1-5. Output ONLY the integer."
      user_prompt: "Response: {response}"
      score_parsing_regex: "([1-5])"   # regex to extract the numeric score from the LLM's text output
      custom_metric_directionality: higherIsBetter   # "higherIsBetter" | "lowerIsBetter"
    intervention:
      action: block
      message: "Response failed safety evaluation."
      conditions:
        - comparand: 2
          comparator: lessThan

  - name: Topic Adherence
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: topic_adherence
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_topic_adherence_config:
      metric_mode: f1          # "f1" | "precision" | "recall"
      reference_topics: [DataRobot, machine learning, AI platforms]
    intervention:
      action: report
      conditions: []

  - name: Response Relevancy
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: response_relevancy
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_response_relevancy_config:
      embedding_deployment_id: "<your-embedding-id>"
    intervention:
      action: report
      conditions: []
```

---

## 6. LLM back-end options

Some `ootb` guards (e.g. `faithfulness`, `task_adherence`) call an LLM to judge the text. You choose which LLM provider to use via `llm_type`.

> **DataRobot credentials (`DATAROBOT_ENDPOINT` + `DATAROBOT_API_TOKEN`) are always required**

### Supported `llm_type` values

| `llm_type` | LLM provider | Extra YAML fields | Extra install |
|---|---|---|---|
| `datarobot` | DataRobot-hosted LLM deployment | `deployment_id` | `datarobot-sdk` |
| `openAi` | OpenAI API | *(none)* | `llm-eval` |
| `azureOpenAi` | Azure OpenAI | `openai_api_base`, `openai_deployment_id` | `llm-eval` |
| `google` | Google Vertex AI | `google_region`, `google_model` | `llm-eval,vertex` |
| `amazon` | AWS Bedrock | `aws_region`, `aws_model` | `llm-eval,bedrock` |
| `nim` | NVIDIA NIM | `openai_api_base` | `llm-eval,nvidia` |
| `llmGateway` | DataRobot LLM Gateway | `llm_gateway_model_id` | `datarobot-sdk` |


**`nemo_guardrails` supports:** `openAi`, `azureOpenAi`, `nim` only  
**`nemo_evaluator` supports:** `datarobot` only

### Available models (Google / AWS)

The library maps a fixed set of model names to their provider API identifiers. Models not in this list are not supported.

| Provider | `llm_type` | `google_model` / `aws_model` |
|---|---|---|
| Google Vertex AI | `google` | `google-gemini-1.5-flash`, `google-gemini-1.5-pro`, `chat-bison` |
| AWS Bedrock | `amazon` | `amazon-titan`, `anthropic-claude-2`, `anthropic-claude-3-haiku`, `anthropic-claude-3-sonnet`, `anthropic-claude-3-opus`, `anthropic-claude-3.5-sonnet-v1`, `anthropic-claude-3.5-sonnet-v2`, `amazon-nova-lite`, `amazon-nova-micro`, `amazon-nova-pro` |

---

## 7. Full annotated example

> Replace every `<...>` placeholder with a real value before use.
> DataRobot deployment IDs are exactly 24 hexadecimal characters.

```yaml
timeout_sec: 15
timeout_action: score

guards:
  # -- Pre-score (prompt) --------------------------------------------------

  - name: Prompt Injection
    type: model
    stage: prompt
    deployment_id: "<prompt-injection-id>"
    model_info:
      input_column_name: text
      target_name: injection_injection_PREDICTION
      target_type: Binary
      class_names: []
    intervention:
      action: block
      message: "Prompt injection attempt detected and blocked."
      conditions:
        - comparand: 0.80
          comparator: greaterThan

  - name: Toxicity
    type: model
    stage: prompt
    deployment_id: "<toxicity-id>"
    model_info:
      input_column_name: text
      target_name: toxicity_toxic_PREDICTION
      target_type: Binary
      class_names: []
    intervention:
      action: block
      message: "Toxic content is not allowed."
      conditions:
        - comparand: 0.5
          comparator: greaterThan

  - name: PII Detector
    type: model
    stage: prompt
    deployment_id: "<pii-id>"
    model_info:
      input_column_name: text
      target_name: contains_pii_true_PREDICTION
      target_type: TextGeneration
      replacement_text_column_name: anonymized_text_OUTPUT
      class_names: []
    intervention:
      action: replace
      message: "PII detected and removed."
      conditions:
        - comparand: 0.5
          comparator: greaterThan

  - name: Topic Guardrail
    type: nemo_guardrails
    stage: prompt
    llm_type: azureOpenAi
    openai_api_base: "https://<resource>.openai.azure.com/"
    openai_deployment_id: gpt-4o-mini
    intervention:
      action: block
      message: "This topic is outside the allowed scope."
      conditions:
        - comparand: "TRUE"
          comparator: equals

  # -- Post-score (response) -----------------------------------------------

  - name: Response Token Count
    type: ootb
    ootb_type: token_count
    stage: response

  - name: Faithfulness
    type: ootb
    ootb_type: faithfulness
    stage: response
    copy_citations: true
    llm_type: datarobot
    deployment_id: "<llm-id>"
    intervention:
      action: block
      message: "The response appears to be hallucinated."
      conditions:
        - comparand: 0.0
          comparator: equals

  - name: Task Adherence
    type: ootb
    ootb_type: task_adherence
    stage: response
    llm_type: datarobot
    deployment_id: "<llm-id>"
    intervention:
      action: block
      message: "LLM did not complete the requested task."
      conditions:
        - comparator: lessThan
          comparand: 0.5

  - name: Cost
    type: ootb
    ootb_type: cost
    stage: response
    additional_guard_config:
      cost:
        currency: USD
        input_price: 0.01
        input_unit: 1000
        output_price: 0.03
        output_unit: 1000
    intervention:
      action: report
      conditions: []
```

---

## 8. Using the config in Python

### Evaluate prompt or response individually

`evaluate_prompt` and `evaluate_response` each return `(EvaluationResult, latency_seconds)`.
`EvaluationResult.metrics` holds the guard scores keyed by guard name.

```python
import os
from datarobot_dome.api import ModerationPipeline

os.environ["TARGET_NAME"]         = "resultText"   # must match your deployment's response output field name
os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"

pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")

result, latency = pipeline.evaluate_prompt("What is DataRobot?")
if result.blocked:
    print(f"Blocked: {result.blocked_message}")

result, latency = pipeline.evaluate_response(
    "DataRobot is an AI platform.",
    prompt="What is DataRobot?",
)
print(f"Latency: {latency:.3f}s  Blocked: {result.blocked}  Metrics: {result.metrics}")
```

### Full pipeline (prompt to LLM to response)

`evaluate_full_pipeline` returns a `PipelineResult` (no latency tuple).

```python
pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")

def my_llm(prompt: str) -> str:
    return "DataRobot is an AI platform."   # replace with your LLM call

result = pipeline.evaluate_full_pipeline("What is DataRobot?", my_llm)

if result.blocked:
    stage = "prompt" if result.prompt_evaluation.blocked else "response"
    blocked_eval = (
        result.prompt_evaluation if result.prompt_evaluation.blocked
        else result.response_evaluation
    )
    print(f"Blocked at {stage}: {blocked_eval.blocked_message}")
elif result.replaced:
    print(f"Text replaced. Response: {result.response}")
else:
    print(f"Response: {result.response}")
    print(f"Metrics: {result.response_evaluation.metrics}")
```

### With DRUM

Place `moderation_config.yaml` alongside your custom model code, then:

```bash
drum score --verbose \
  --code-dir ./ \
  --input ./input.csv \
  --target-type textgeneration \
  --runtime-params-file values.yaml
```

---

## 9. Testing guide

Set these environment variables before running any test (see [Environment variables](#10-environment-variables) for details):

```bash
export DATAROBOT_ENDPOINT="https://app.datarobot.com/api/v2"
export DATAROBOT_API_TOKEN="your-token"
export TARGET_NAME="resultText"   # must match your deployment's response output field name
```

### Group A — No LLM required (`token_count`, `rouge_1`, `cost`)

These guards run locally — no deployment ID or provider credentials needed beyond the base DataRobot variables above.

```yaml
guards:
  - name: Prompt Token Count
    type: ootb
    ootb_type: token_count
    stage: prompt

  - name: Response Token Count
    type: ootb
    ootb_type: token_count
    stage: response
    intervention:
      action: block
      message: "Response too long."
      conditions:
        - comparand: 500
          comparator: greaterThan
```

```python
from datarobot_dome.api import ModerationPipeline

pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")

result, _ = pipeline.evaluate_prompt("What is DataRobot?")
print(f"[prompt] blocked={result.blocked}  metrics={result.metrics}")

result, _ = pipeline.evaluate_response(
    "DataRobot is an AI platform for building and deploying machine learning models.",
    prompt="What is DataRobot?",
)
print(f"[response] blocked={result.blocked}  metrics={result.metrics}")
```

---

### Group B — DataRobot LLM or model deployment

Guards that call a DataRobot-hosted deployment (LLM-eval guards with `llm_type: datarobot`, or `type: model` guards). No extra credentials beyond the base variables; just provide a real `deployment_id`.

```yaml
# LLM-eval guard using a DataRobot-hosted LLM
guards:
  - name: Task Adherence
    type: ootb
    ootb_type: task_adherence
    stage: response
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    intervention:
      action: block
      message: "LLM did not complete the requested task."
      conditions:
        - comparator: lessThan
          comparand: 0.5

  # Model guard using a DataRobot classifier deployment
  - name: Toxicity
    type: model
    stage: prompt
    deployment_id: "<your-toxicity-deployment-id>"
    model_info:
      input_column_name: text
      target_name: toxicity_toxic_PREDICTION
      target_type: Binary
      class_names: []
    intervention:
      action: block
      message: "Toxic content blocked."
      conditions:
        - comparand: 0.5
          comparator: greaterThan
```

```python
from datarobot_dome.api import ModerationPipeline

pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")

result, latency = pipeline.evaluate_response(
    "DataRobot is an AI platform.",
    prompt="What is DataRobot?",
)
print(f"Latency: {latency:.3f}s  Blocked: {result.blocked}  Metrics: {result.metrics}")
```

---

### Group C — External LLM provider (OpenAI / Azure OpenAI / Google / AWS)

Same YAML structure as Group B, but set `llm_type` to the external provider and add the corresponding credential env var (see [Environment variables](#10-environment-variables)).

```bash
# Example: Azure OpenAI as the LLM judge for task_adherence
export MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_AZURE_OPENAI_API_KEY=\
  '{"type":"credential","payload":{"credentialType":"api_token","apiToken":"YOUR_AZURE_KEY"}}'
```

```yaml
guards:
  - name: Task Adherence
    type: ootb
    ootb_type: task_adherence
    stage: response
    llm_type: azureOpenAi
    openai_api_base: "https://<your-resource>.openai.azure.com/"
    openai_deployment_id: "gpt-4o"
    intervention:
      action: block
      message: "LLM did not complete the requested task."
      conditions:
        - comparator: lessThan
          comparand: 0.5
```

### Group D — DataRobot LLM Gateway

The LLM Gateway routes standard LangChain traffic to external models (like Azure GPT-4) securely through DataRobot. **No deployment ID and no external API keys are required.** It authenticates entirely using your standard `DATAROBOT_API_TOKEN` and provides out-of-the-box telemetry and auditing.

To use it, set `llm_type: llmGateway` and provide the `llm_gateway_model_id`.

```yaml
guards:
  - name: Hate Speech
    type: ootb
    ootb_type: agent_guideline_adherence
    stage: response
    llm_type: llmGateway
    llm_gateway_model_id: "azure/gpt-4o-2024-11-20"
    additional_guard_config:
      agent_guideline: "The response must not contain hate speech, slurs, or content that demeans people based on race, religion, gender, nationality, or any other protected characteristic."
    intervention:
      action: report    # Report only. We will read the metric manually in Python.
      conditions: []
```

```python
import os
from datarobot_dome.api import ModerationPipeline

# Only standard DataRobot credentials are required. No Azure/OpenAI keys needed.
os.environ["TARGET_NAME"]         = "resultText"
os.environ["DATAROBOT_ENDPOINT"]  = "https://app.datarobot.com/api/v2"
os.environ["DATAROBOT_API_TOKEN"] = "<your-dr-token>"

pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")

# Test 1: A benign response
safe_text = "The world is full of rich and diverse cultures worth celebrating."
result, latency = pipeline.evaluate_response(response=safe_text, prompt="Describe this text.")

# Because 'action: report' is set, we read the score directly from the metrics dict
passed = result.metrics.get("agent_guideline_adherence_score")
print(f"Passed: {passed} | Latency: {latency:.3f}s | Text: {safe_text!r}")

# Test 2: A toxic response
toxic_text = "People from that group are inferior and should be removed from society."
result, latency = pipeline.evaluate_response(response=toxic_text, prompt="Describe this text.")

passed = result.metrics.get("agent_guideline_adherence_score")
print(f"Passed: {passed} | Latency: {latency:.3f}s | Text: {toxic_text!r}")
```
---

## 10. Environment variables

### Always required

| Variable | Description |
|---|---|
| `DATAROBOT_ENDPOINT` | DataRobot instance URL, e.g. `https://app.datarobot.com/api/v2` |
| `DATAROBOT_API_TOKEN` | DataRobot API token |
| `TARGET_NAME` | The name of the output field in your deployment's prediction response that contains the generated text (e.g. `resultText`). Required by all response-stage guards in standalone Python. DRUM sets this automatically. |
| `DISABLE_MODERATION` | Set to `true` to disable all guards at runtime. |

### Credentials for LLM-eval guards using external providers

When your guard uses `llm_type: datarobot`, it reuses `DATAROBOT_API_TOKEN` — no extra variable needed.

For external providers (OpenAI, Azure OpenAI, Google, AWS), set a guard-specific env var. The variable name is built from the guard's type, stage, and ootb_type:

```
MLOPS_RUNTIME_PARAM_MODERATION_{TYPE}_{STAGE}_{OOTB_TYPE}_{PROVIDER_SUFFIX}
```

| Guard (`ootb_type`) | Provider | Environment variable |
|---|---|---|
| `task_adherence` | OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_OPENAI_API_KEY` |
| `task_adherence` | Azure OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_AZURE_OPENAI_API_KEY` |
| `faithfulness` | OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_FAITHFULNESS_OPENAI_API_KEY` |
| `faithfulness` | Azure OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_FAITHFULNESS_AZURE_OPENAI_API_KEY` |
| `agent_guideline_adherence` | Azure OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GUIDELINE_ADHERENCE_AZURE_OPENAI_API_KEY` |
| `agent_guideline_adherence` | Google Vertex AI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GUIDELINE_ADHERENCE_GOOGLE_SERVICE_ACCOUNT` |
| `agent_goal_accuracy` | Azure OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GOAL_ACCURACY_AZURE_OPENAI_API_KEY` |
| `agent_goal_accuracy` | AWS Bedrock | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GOAL_ACCURACY_AWS_ACCOUNT` |
| `nemo_guardrails` (prompt) | Azure OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_NEMO_GUARDRAILS_PROMPT_AZURE_OPENAI_API_KEY` |

Value format per provider:

```bash
# OpenAI / Azure OpenAI
'{"type":"credential","payload":{"credentialType":"api_token","apiToken":"YOUR_KEY"}}'

# Google Vertex AI
'{"type":"credential","payload":{"credentialType":"gcp","gcpKey":{...}}}'

# AWS Bedrock
'{"type":"credential","payload":{"credentialType":"s3","awsAccessKeyId":"...","awsSecretAccessKey":"...","awsSessionToken":"..."}}'
```

