Metadata-Version: 2.4
Name: datarobot-moderations
Version: 11.2.34
Summary: DataRobot Monitoring and Moderation framework
License: DataRobot Tool and Utility Agreement
Author: DataRobot
Author-email: support@datarobot.com
Requires-Python: >=3.10,<3.13
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: all
Provides-Extra: bedrock
Provides-Extra: datarobot-sdk
Provides-Extra: llm-eval
Provides-Extra: nemo
Provides-Extra: nemo-evaluator
Provides-Extra: nvidia
Provides-Extra: vertex
Requires-Dist: aiohttp (>=3.9.5)
Requires-Dist: backoff (>=2.2.1)
Requires-Dist: click (>=8.0.0)
Requires-Dist: datarobot (>=3.6.0) ; extra == "datarobot-sdk" or extra == "all"
Requires-Dist: datarobot-predict (>=1.9.6) ; extra == "datarobot-sdk" or extra == "all"
Requires-Dist: deepeval (>=3.3.5) ; extra == "llm-eval" or extra == "all"
Requires-Dist: google-cloud-aiplatform (>=1.133.0,<2) ; extra == "vertex" or extra == "all"
Requires-Dist: langchain (>=1.2.0,<2) ; extra == "llm-eval" or extra == "all"
Requires-Dist: langchain-core (>=1.2.0,<2) ; extra == "llm-eval" or extra == "all"
Requires-Dist: langchain-nvidia-ai-endpoints (>=0.3.9) ; extra == "nvidia" or extra == "all"
Requires-Dist: langchain-openai (>=0.1.7) ; extra == "llm-eval" or extra == "all"
Requires-Dist: llama-index (>=0.14.0) ; extra == "llm-eval" or extra == "all"
Requires-Dist: llama-index-embeddings-azure-openai (>=0.1.6) ; extra == "llm-eval" or extra == "all"
Requires-Dist: llama-index-llms-bedrock-converse (>=0.1.6) ; extra == "bedrock" or extra == "all"
Requires-Dist: llama-index-llms-langchain (>=0.8.0) ; extra == "llm-eval" or extra == "all"
Requires-Dist: llama-index-llms-openai (>=0.1.0) ; extra == "llm-eval" or extra == "all"
Requires-Dist: llama-index-llms-vertex (>=0.1.5) ; extra == "vertex" or extra == "all"
Requires-Dist: nemo-microservices (>=1.5.0,<2.0.0) ; extra == "nemo-evaluator" or extra == "all"
Requires-Dist: nemoguardrails (>=0.20.0) ; extra == "nemo" or extra == "all"
Requires-Dist: numpy (>=1.25.0)
Requires-Dist: openai (>=1.14.3) ; extra == "llm-eval" or extra == "all"
Requires-Dist: opentelemetry-api (>=1.16.0)
Requires-Dist: opentelemetry-exporter-otlp-proto-http (>=1.16.0)
Requires-Dist: opentelemetry-instrumentation (>=0.60b1,<0.61)
Requires-Dist: opentelemetry-sdk (>=1.16.0)
Requires-Dist: pandas (>=2.0.3)
Requires-Dist: pillow (>=12.1.1)
Requires-Dist: pydantic (>=2.0.0,<3.0.0)
Requires-Dist: ragas (>=0.4.3) ; extra == "llm-eval" or extra == "all"
Requires-Dist: requests (>=2.32.4)
Requires-Dist: rouge-score (>=0.1.2)
Requires-Dist: tiktoken (>=0.5.1)
Description-Content-Type: text/markdown

# DataRobot Moderations library

This library enforces the intervention in the prompt and response texts as per the
guard configuration set by the user.

The library accepts the guard configuration in the yaml format and the input prompts
and outputs the dataframe with the details like:
- should the prompt be blocked
- should the completion be blocked
- metric values obtained from the model guards
- is the prompt or response modified as per the modifier guard configuration


## Architecture

The library is architected in a way that it wraps around the typical LLM prediction method.
The library will first run the pre-score guards - the guards that will evaluate prompts and
enforce moderation if necessary.  All the prompts that were not moderated by the library are
forwarded to the actual LLM to get their respective completions.  The library then evaluates
these completions using post-score guards and enforces intervention on them.

![](pics/img.png)

## How to build it?

The repository uses `poetry` to manage the build process and a wheel can be built using:
```bash
make clean
make
```

## How to use it?

A wheel file generated or downloaded can be installed with pip and will pull its
dependencies as well.
```bash
pip3 install datarobot-moderations
```

### Optional extras

The base install covers token-count, ROUGE-1, cost, and NeMo guards.
Heavier or cloud-specific dependencies are opt-in:

| Extra | What it enables |
|---|---|
| `datarobot-sdk` | DataRobot model guards, DataRobot LLM evaluator type |
| `llm-eval` | Faithfulness, Task Adherence, Agent Goal Accuracy, Guideline Adherence guards |
| `nemo` | NeMo Guardrails colang-based flow guard |
| `nemo-evaluator` | NeMo live-evaluation microservice guard |
| `nvidia` | NVIDIA NIM / ChatNVIDIA LLM support |
| `vertex` | Google Cloud Vertex AI LLM support |
| `bedrock` | AWS Bedrock LLM support |
| `all` | Every optional dependency at once |

```bash
# Example: task-adherence guard backed by a DataRobot LLM deployment
pip3 install 'datarobot-moderations[llm-eval,datarobot-sdk]'
```

### OpenTelemetry (OTEL) integration

Install the `datarobot-opentelemetry` package and call `configure` **before** creating a
`ModerationPipeline`. This sets up traces, metrics, and logs and ships them to the
DataRobot telemetry backend.

```python
import logging
from datarobot_opentelemetry.integrations import configure
from datarobot_dome.api import ModerationPipeline

configure(
    # endpoint defaults to DATAROBOT_ENDPOINT env var when omitted
    entity_type="deployment",          # "deployment" or "workload"
    entity_id="<your-entity-id>",
    # api_key defaults to DATAROBOT_API_TOKEN env var when omitted
    log_level=logging.INFO,            # verbosity of the OTel integration itself
    metrics_export_interval=60000,     # milliseconds between metric flushes (0 = disable)
)

pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")
```

| Parameter | Description | Default |
|---|---|---|
| `endpoint` | Telemetry collector URL | `DATAROBOT_ENDPOINT` env var |
| `entity_type` | Entity kind (`"deployment"` or `"workload"`) | `None` |
| `entity_id` | Unique ID of the monitored entity | `None` |
| `api_key` | Authentication key for the collector | `DATAROBOT_API_TOKEN` env var |
| `log_level` | Python logging level for the OTel integration | `logging.INFO` |
| `metrics_export_interval` | Export interval in milliseconds; `0` disables metric export | `60000` |

`configure` returns a `ConfigureResult` object that indicates which signals (traces, metrics,
logs) were successfully initialised.

### deepeval telemetry

Moderations opts out of deepeval telemetry by default.

### Transient dependencies and build compatibility

Installing `[all]` (or the `nemo` / `llm-eval` extras individually) pulls in
packages that `nemoguardrails` and `deepeval` declare as runtime dependencies
but that this library never uses at runtime:

| Package | Pulled in by | Problem |
|---|---|---|
| `annoy` | `nemoguardrails` | Requires a C++ compiler; breaks restricted build environments such as Kaniko |
| `fastembed` / `onnxruntime` | `nemoguardrails` | Heavy ML runtimes, hundreds of MB |
| `fastapi` / `starlette` / `uvicorn` | `nemoguardrails` | Web server stack, only used by nemoguardrails' built-in server |
| `watchdog` / `prompt-toolkit` / `typer` | `nemoguardrails`, `deepeval` | Dev-server and CLI tools |
| `pyfiglet` / `wheel` | `deepeval` | CLI banner / build artefact mis-declared as a runtime dep |

To exclude them, add the following to **your own project's** `pyproject.toml`
(these overrides are not inherited from this library):

```toml
[tool.uv]
override-dependencies = [
    "annoy; sys_platform == 'never'",
    "fastembed; sys_platform == 'never'",
    "onnxruntime; sys_platform == 'never'",
    "fastapi; sys_platform == 'never'",
    "starlette; sys_platform == 'never'",
    "uvicorn; sys_platform == 'never'",
    "watchdog; sys_platform == 'never'",
    "prompt-toolkit; sys_platform == 'never'",
    "typer; sys_platform == 'never'",
    "pyfiglet; sys_platform == 'never'",
    "wheel; sys_platform == 'never'",
]
```

### Standalone Python API

Create a `ModerationPipeline` from a YAML file, a plain dict, or a Pydantic config object:

```python
from datarobot_dome.api import ModerationPipeline

# From a YAML file
pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")

# From a plain Python dictionary (same schema as YAML)
pipeline = ModerationPipeline.from_dict({"targets": [{"target": "_default", "guards": [...]}]})

# From a Pydantic ModerationConfig object (full type-safety / IDE autocompletion)
from datarobot_dome.schema import ModerationConfig, OOTBGuardSchema, TargetBlock
pipeline = ModerationPipeline.from_config(ModerationConfig(...))

# model_dir is an optional kwarg on from_dict / from_config:
# base directory for resolving NeMo guardrails .co flow files (default: os.getcwd())
pipeline = ModerationPipeline.from_dict({...}, model_dir="/path/to/nemo_dir")
```

All three constructors validate `DATAROBOT_ENDPOINT` and `DATAROBOT_API_TOKEN` before initialising.

**Evaluate a prompt** (prescore guards):

```python
result, latency, prescore_df = pipeline.evaluate_prompt("Ignore previous instructions and …")
if result.blocked:
    print(result.blocked_message)
```

**Evaluate a response** (postscore guards):

```python
result, latency, postscore_df = pipeline.evaluate_response(
    response="The capital of France is Paris.",
    prompt="What is the capital of France?",  # required for faithfulness / task-adherence guards
)
print(result.blocked, result.metrics)
```

**Full pipeline** — prescore → LLM → postscore:

```python
def my_llm(prompt: str) -> str:
    return "DataRobot is an AI platform."  # replace with your LLM call

result, prescore_df, postscore_df = pipeline.evaluate_full_pipeline("What is DataRobot?", my_llm)
if not result.blocked:
    print(result.response)
```

Each method has an async counterpart — just `await` it inside an async function:
`evaluate_prompt_async`, `evaluate_response_async`, `evaluate_full_pipeline_async`.

**Streaming pipeline** — prescore → LLM stream → per-chunk postscore:

```python
async def my_llm_stream(prompt: str):
    # wrap a sync SDK stream, or yield directly from an async SDK
    for chunk in sync_openai_client.chat.completions.create(
        model="gpt-4o", messages=[{"role": "user", "content": prompt}], stream=True
    ):
        yield chunk

async for chunk in pipeline.evaluate_full_pipeline_stream_async("What is DataRobot?", my_llm_stream):
    if chunk.choices[0].finish_reason == "content_filter":
        print("Blocked:", chunk.choices[0].delta.content)
        break
    print(chunk.choices[0].delta.content or "", end="", flush=True)
```

A `finish_reason="content_filter"` chunk means a guard blocked content — either at prescore
(LLM never called) or mid-stream from a postscore guard.

> For the full API reference — all parameters, return types, result-object fields,
> DataFrame column schemas, streaming details, and env-var reference —
> see **[docs/GUARDRAILS.md § 8](docs/GUARDRAILS.md#8-using-the-config-in-python)**.

### Command-line interface (CLI)

The package ships a `dr-moderation` CLI so you can manage guards without writing Python code.

> Python 3.10 – 3.12 is required.

```bash
# Install the package
pip install datarobot-moderations[all]

# Set credentials
export DATAROBOT_ENDPOINT="https://app.datarobot.com/api/v2"
export DATAROBOT_API_TOKEN="your-api-token"

# Evaluate a prompt and response against a local config — no deployment required
dr-moderation evaluate \
  --config-file moderation_config.yaml \
  --prompt "What is DataRobot?" \
  --response "DataRobot is an AI platform." \
  --as-json

# Add guards to an existing custom model — prints the new version ID
dr-moderation add-guard \
  --custom-model-id 6793e6b2114f17240fa2194c \
  --config-file docs/examples/add_guard_config.yaml

# Verify connectivity to a remote A2A agent
dr-moderation agent a2a connect --url https://my-llm-agent.example.com

# Start a JSON-RPC 2.0 server so Java / Go / C# apps can evaluate without HTTP overhead
dr-moderation serve --config-file moderation_config.yaml   # stdio (default)
dr-moderation serve --transport ws --port 9000 --config-file moderation_config.yaml
```

Ready-made config examples are in `docs/examples/`. See **[docs/CLI.md](docs/CLI.md)** for the full option reference, both YAML schemas, and exit codes.

### With [DRUM](https://github.com/datarobot/datarobot-user-models)
As described above, the library nicely wraps DRUM's `score` method for pre and post score
guards. Hence, in case of DRUM, the user simply runs their custom model using `drum score`
and can avail the moderation library features.

Install DRUM along with the necessary optional extras for your specific guards. If you are unsure which guards are in use, install `[all]`:

```bash
pip3 install datarobot-drum 'datarobot-moderations[all]'
drum score --verbose --logging-level info --code-dir ./ --input ./input.csv --target-type textgeneration --runtime-params-file values.yaml
```
# Guardrails Configuration Guide

Guards evaluate prompts (pre-score) and/or responses (post-score) and can **block**, **report**, or **replace** content based on configurable conditions.

---

## Table of Contents

1. [File structure](#1-file-structure)
2. [Top-level options](#2-top-level-options)
3. [Common guard fields](#3-common-guard-fields)
4. [Intervention block](#4-intervention-block)
5. [Guard types](#5-guard-types)
6. [LLM back-end options](#6-llm-back-end-options)
7. [Full annotated example](#7-full-annotated-example)
8. [Using the config in Python](#8-using-the-config-in-python)
   - [8a. From a YAML file](#8a-from-a-yaml-file)
   - [8b. From a plain Python dict](#8b-from-a-plain-python-dict)
   - [8c. From a Pydantic config object](#8c-from-a-pydantic-config-object)
   - [8d. Streaming pipeline](#8d-streaming-pipeline)
9. [Testing guide](#9-testing-guide)
10. [Environment variables](#10-environment-variables)

---

## 1. File structure

```yaml
timeout_sec: 10
timeout_action: score
nemo_evaluator_deployment_id: "<your-nemo-evaluator-id>"

guards:
  - name: My Guard
    type: ootb
    stage: prompt
    # ...
```

---

## 2. Top-level options

| Field | Type | Default | Description |
|---|---|---|---|
| `timeout_sec` | int | `10` | Seconds to wait per guard |
| `timeout_action` | string | `score` | `score` (allow) or `block` on timeout |
| `nemo_evaluator_deployment_id` | string | — | DataRobot deployment ID of the NeMo Evaluator microservice; required when any guard uses `type: nemo_evaluator` |
| `enable_deepeval_telemetry` | bool | `false` | Opt in to deepeval usage telemetry and local `.deepeval/` artefacts. See [§10](#10-environment-variables). |
| `prompt_column_name` | string | `"promptText"` | Name of the DataFrame column that holds the input text. Used in standalone Python when no DRUM deployment is active. Ignored when a DRUM deployment context is active. |
| `response_column_name` | string | `"completion"` | Name of the DataFrame column that holds the LLM response text. Used in standalone Python as a fallback when `TARGET_NAME` is not set. **Lower priority than `TARGET_NAME`** — if both are provided, `TARGET_NAME` wins. Ignored when a DRUM deployment context is active. |
| `guards` | list | **required** | List of guard definitions |

---

## 3. Common guard fields

| Field | Required | Description |
|---|---|---|
| `name` | ✅ | Unique label; used as the key in `result.metrics` and as the DataRobot custom metric name |
| `type` | ✅ | `ootb` · `model` · `nemo_guardrails` · `nemo_evaluator` |
| `stage` | ✅ | `prompt` · `response` · `[prompt, response]` (list runs the guard at both stages) |
| `description` | ❌ | Free-text label, ignored by the library |
| `intervention` | ❌ | What to do when the condition fires (see [§4](#4-intervention-block)). Omit entirely to measure only — nothing is ever blocked |
| `copy_citations` | ❌ | Boolean (`true`/`false`, default `false`). Passes retrieved RAG context to this guard. **Required for `rouge_1` and `faithfulness` to produce meaningful scores** |
| `is_agentic` | ❌ | Marks an agentic-workflow guard (default `false`). Required by `agent_goal_accuracy` |

```yaml
# stage as a list — guard runs independently at both prompt and response stages
- name: Token Count Both
  type: ootb
  ootb_type: token_count
  stage: [prompt, response]
  intervention:
    action: block
    message: "Input or output exceeds the token limit."
    conditions:
      - comparator: greaterThan
        comparand: 100
```

---

## 4. Intervention block

```yaml
intervention:
  action: block               # "block" | "report" | "replace"
  message: "Blocked."         # returned to caller
  send_notification: false
  conditions:
    - comparand: 0.5
      comparator: greaterThan
```

> **One condition per intervention.** The `conditions` list accepts exactly one entry for
> `block` and `replace`; zero entries (`conditions: []`) is valid for `report`.
> To combine conditions (e.g. block if score < 0.2 **or** > 0.9), use two separate guards.

### Actions

| Action | Effect |
|---|---|
| `block` | Reject and return `message` to the caller. `message` is optional in the schema but omitting it returns an empty string — always set it. |
| `report` | Record the metric and allow content through unchanged. Behaviorally identical to omitting the `intervention` block entirely; useful when you want the metric tracked but never want to block. |
| `replace` | Swap the text with the sanitised version returned by the deployment. Only valid for `type: model` guards. The deployment **must** return the replacement text in the field specified by `model_info.replacement_text_column_name`; if that field is absent a `ValueError` is raised. |

### Comparators

| Comparator | Comparand type | Description |
|---|---|---|
| `greaterThan` / `lessThan` | number | Numeric threshold |
| `equals` / `notEquals` | number \| string | Exact equality. Use `comparand: "TRUE"` with NeMo Guardrails guards, whose score is the string `"TRUE"` or `"FALSE"` |
| `is` / `isNot` | boolean | Boolean equality |
| `matches` / `doesNotMatch` | list of strings | Class membership. `matches` fires if the prediction is in the list; `doesNotMatch` fires if it is not.|
| `contains` / `doesNotContain` | list of strings | Substring check against a list. `contains` fires if **all** items in the list are found as substrings of the prediction; `doesNotContain` fires if not all items are found. |

---

## 5. Guard types

### 5.1 Out-of-the-Box (`ootb`)

Set `type: ootb` and `ootb_type`.

**Install only what you use:**

```bash
pip install datarobot-moderations                          # base — token_count, rouge_1, cost, custom_metric
pip install 'datarobot-moderations[llm-eval]'              # + faithfulness, task_adherence, agent_guideline_adherence, agent_goal_accuracy
pip install 'datarobot-moderations[llm-eval,vertex]'       # + Google Vertex AI as LLM judge
pip install 'datarobot-moderations[llm-eval,bedrock]'      # + AWS Bedrock as LLM judge
pip install 'datarobot-moderations[llm-eval,nvidia]'       # + NVIDIA NIM as LLM judge
pip install 'datarobot-moderations[nemo]'                  # + NeMo Guardrails colang flow guard (type: nemo_guardrails)
pip install 'datarobot-moderations[nemo-evaluator]'        # + NeMo Evaluator microservice guard (type: nemo_evaluator)
pip install 'datarobot-moderations[datarobot-sdk]'         # required for type: model and llm_type: datarobot
pip install 'datarobot-moderations[all]'                   # everything
```

| `ootb_type` | Stage | Install extra | Description |
|---|---|---|---|
| `token_count` | prompt / response | *(base)* | Token count |
| `rouge_1` | response | *(base)* | ROUGE-1 overlap with citations |
| `faithfulness` | response | `llm-eval` | LLM-judged hallucination detection |
| `task_adherence` | response | `llm-eval` | Task-completion score |
| `agent_guideline_adherence` | response | `llm-eval` | Guideline adherence |
| `agent_goal_accuracy` | response | `llm-eval` | Agentic goal-accuracy |
| `cost` | response | *(base)* | Estimated cost. Counts **both** prompt tokens (`input_price`/`input_unit`) and response tokens (`output_price`/`output_unit`). Must be at the response stage because both token counts are only available after the LLM responds. Currently only `currency: USD` is supported. |
| `custom_metric` | prompt / response | *(base)* | User-defined numeric metric |
| `nim_jailbreak` | prompt / response | *(base)* | Calls the NVIDIA NeMo `nemoguard-jailbreak-detect` NIM at `/v1/classify`. Returns `1.0` when jailbreak is detected, `0.0` otherwise. `nim_endpoint` is required. |
| `nim_content_safety` | prompt / response | *(base)* | Calls the NVIDIA NeMo `llama-3.1-nemoguard-8b-content-safety` NIM at `/v1/chat/completions`. Returns `1.0` when any unsafe category is detected, `0.0` otherwise. The first matched category is recorded in a companion `*_nim_content_safety_category` column. `nim_endpoint` is required. |

```yaml
# Token count — report only
- name: Prompt Token Count
  type: ootb
  ootb_type: token_count
  stage: prompt

# Token count — block on length
- name: Response Token Count
  type: ootb
  ootb_type: token_count
  stage: response
  intervention:
    action: block
    message: "Response too long."
    conditions:
      - comparand: 1000
        comparator: greaterThan

# ROUGE-1 (requires citations)
- name: Rouge 1
  type: ootb
  ootb_type: rouge_1
  stage: response
  copy_citations: true
  intervention:
    action: report
    conditions: []

# Faithfulness
- name: Faithfulness
  type: ootb
  ootb_type: faithfulness
  stage: response
  copy_citations: true
  llm_type: datarobot
  deployment_id: "<your-llm-id>"   # 24-char DataRobot deployment ID
  intervention:
    action: block
    message: "Hallucination detected."
    conditions:
      - comparand: 0.0
        comparator: equals

# Task Adherence
- name: Task Adherence
  type: ootb
  ootb_type: task_adherence
  stage: response
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  intervention:
    action: block
    message: "LLM did not complete the requested task."
    conditions:
      - comparator: lessThan
        comparand: 0.5

# Guideline Adherence
- name: Guideline Adherence
  type: ootb
  ootb_type: agent_guideline_adherence
  stage: response
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  additional_guard_config:
    agent_guideline: "Response must be polite and on-topic."   # free-text criterion for the LLM judge
  intervention:
    action: block
    message: "Response violates guidelines."
    conditions:
      - comparand: 0.0
        comparator: equals

# Agent Goal Accuracy
- name: Agent Goal Accuracy
  type: ootb
  ootb_type: agent_goal_accuracy
  stage: response
  is_agentic: true
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  intervention:
    action: report
    conditions: []

# Cost tracking
- name: Cost
  type: ootb
  ootb_type: cost
  stage: response
  additional_guard_config:
    cost:
      currency: USD
      input_price: 0.01
      input_unit: 1000
      output_price: 0.03
      output_unit: 1000
  intervention:
    action: report
    conditions: []

# NIM Jailbreak — block jailbreak attempts (prompt stage)
# nim_endpoint is the Workloads API base URL for the NIM workload,
# e.g. https://app.datarobot.com/api/v2/endpoints/workloads/<id>
- name: NIM Jailbreak Detect
  type: ootb
  ootb_type: nim_jailbreak
  stage: prompt
  nim_endpoint: "https://<host>/api/v2/endpoints/workloads/<workload-id>"
  intervention:
    action: block
    message: "Jailbreak attempt blocked."
    conditions:
      - comparand: 0.5
        comparator: greaterThan

# NIM Content Safety — block unsafe content, report the category
- name: NIM Content Safety
  type: ootb
  ootb_type: nim_content_safety
  stage: prompt
  nim_endpoint: "https://<host>/api/v2/endpoints/workloads/<workload-id>"
  intervention:
    action: block
    message: "Unsafe content blocked."
    conditions:
      - comparand: 0.5
        comparator: greaterThan
```

> **`nim_endpoint`** must point at the NIM container base URL — the library appends
> `/v1/classify` (jailbreak) or `/v1/chat/completions` (content-safety) automatically.
> When the NIM runs behind the DataRobot Workloads API the URL format is
> `https://<host>/api/v2/endpoints/workloads/<workload-id>`.
> DataRobot credentials (`DATAROBOT_ENDPOINT` + `DATAROBOT_API_TOKEN`) are used
> automatically to authenticate the request.

---

### 5.2 Model guard

Wraps any **DataRobot deployment** you have already created (binary classifier, regression, multiclass, or text-generation). The library sends the text to that deployment and uses the prediction it returns to decide whether to block, report, or replace content.

```yaml
# Binary classifier (e.g. toxicity, prompt injection)
# Works with any DataRobot binary classification deployment.
- name: Toxicity
  type: model
  stage: prompt
  deployment_id: "<your-deployment-id>"   # 24-char DataRobot deployment ID
  model_info:
    input_column_name: text               # field your deployment reads as input
    target_name: toxicity_toxic_PREDICTION  # prediction field returned by the deployment
    target_type: Binary        # Binary | Regression | Multiclass | TextGeneration
    class_names: []            # leave empty for Binary/Regression
  intervention:
    action: block
    message: "Toxic content blocked."
    conditions:
      - comparand: 0.5
        comparator: greaterThan

# PII detection with text replacement
# The deployment must return BOTH the score field (`target_name`)
# AND a sanitised-text field (`replacement_text_column_name`).
- name: PII Detector
  type: model
  stage: prompt
  deployment_id: "<your-pii-deployment-id>"
  model_info:
    input_column_name: text
    target_name: contains_pii_true_PREDICTION
    target_type: TextGeneration
    replacement_text_column_name: anonymized_text_OUTPUT
    class_names: []
  intervention:
    action: replace
    message: "PII removed from prompt."
    conditions:
      - comparand: 0.5
        comparator: greaterThan

# Multi-label / emotion classifier
- name: Emotion Classifier
  type: model
  stage: prompt
  deployment_id: "<your-emotion-deployment-id>"
  model_info:
    input_column_name: text
    target_name: target_PREDICTION
    target_type: TextGeneration
    class_names: [anger, fear, sadness, disgust, joy, neutral]
  intervention:
    action: block
    message: "Negative emotion detected."
    conditions:
      - comparand: [anger, fear, sadness, disgust]
        comparator: matches
```

---

### 5.3 NeMo Guardrails

Flow-based content filtering. Requires `pip install 'datarobot-moderations[nemo]'`.

> **Supported `llm_type` values:** `openAi`, `azureOpenAi`, `nim`, `llmGateway` only.

Colang flow files must live in stage-specific subdirectories of `nemo_guardrails/`:

```
nemo_guardrails/
  prompt/      # config.yml + *.co files for stage: prompt
  response/    # config.yml + *.co files for stage: response
```

```yaml
- name: Stay on topic
  type: nemo_guardrails
  stage: prompt
  llm_type: azureOpenAi
  openai_api_base: "https://<resource>.openai.azure.com/"
  openai_deployment_id: gpt-4o-mini
  intervention:
    action: block
    message: "This topic is outside the allowed scope."
    conditions:
      - comparand: "TRUE"
        comparator: equals
```

---

### 5.4 NeMo Evaluator

Calls a DataRobot-hosted NeMo Evaluator microservice. Requires `pip install 'datarobot-moderations[nemo-evaluator]'`.

**Two deployment IDs — what's the difference?**

| Field | What it points to |
|---|---|
| `nemo_evaluator_deployment_id` (top-level) | Your **NeMo Evaluator microservice** deployment in DataRobot |
| `deployment_id` (per-guard) | The **LLM deployment** the evaluator uses to do the judging |

Both values must be valid 24-character DataRobot deployment IDs. Using a placeholder longer than 24 characters (e.g. `"<your-nemo-evaluator-id>"`) causes a load-time validation error: `String is longer than 24 characters`.

> **`llm_type`** must be `datarobot` for all `nemo_evaluator` guards.

| `nemo_evaluator_type` | Stage | Description |
|---|---|---|
| `llm_judge` | prompt / response | Custom LLM-as-judge with your own prompts. `score_parsing_regex` is a regular expression applied to the LLM's raw text reply to extract a single numeric score — e.g. `"([1-5])"` picks the first digit 1–5 from any surrounding text. |
| `context_relevance` | response | Relevance of retrieved context to the question |
| `response_groundedness` | response | Groundedness in retrieved context |
| `topic_adherence` | response | Adherence to allowed topics |
| `response_relevancy` | response | Relevance of response to question |
| `faithfulness` | response | NeMo microservice faithfulness score |
| `agent_goal_accuracy` | response | Agentic goal-accuracy via NeMo |

```yaml
nemo_evaluator_deployment_id: "<your-nemo-evaluator-id>"

guards:
  - name: Safety Judge
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: llm_judge
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_llm_judge_config:
      system_prompt: "Rate safety 1-5. Output ONLY the integer."
      user_prompt: "Response: {response}"
      score_parsing_regex: "([1-5])"   # regex to extract the numeric score from the LLM's text output
      custom_metric_directionality: higherIsBetter   # "higherIsBetter" | "lowerIsBetter"
    intervention:
      action: block
      message: "Response failed safety evaluation."
      conditions:
        - comparand: 2
          comparator: lessThan

  - name: Topic Adherence
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: topic_adherence
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_topic_adherence_config:
      metric_mode: f1          # "f1" | "precision" | "recall"
      reference_topics: [DataRobot, machine learning, AI platforms]
    intervention:
      action: report
      conditions: []

  - name: Response Relevancy
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: response_relevancy
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_response_relevancy_config:
      embedding_deployment_id: "<your-embedding-id>"
    intervention:
      action: report
      conditions: []
```

---

## 6. LLM back-end options

Some `ootb` guards (e.g. `faithfulness`, `task_adherence`) call an LLM to judge the text. You choose which LLM provider to use via `llm_type`.

> **DataRobot credentials (`DATAROBOT_ENDPOINT` + `DATAROBOT_API_TOKEN`) are always required**

### Supported `llm_type` values

| `llm_type` | LLM provider | Extra YAML fields | Extra install |
|---|---|---|---|
| `datarobot` | DataRobot-hosted LLM deployment | `deployment_id` | `datarobot-sdk` |
| `openAi` | OpenAI API | *(none)* | `llm-eval` |
| `azureOpenAi` | Azure OpenAI | `openai_api_base`, `openai_deployment_id` | `llm-eval` |
| `google` | Google Vertex AI | `google_region`, `google_model` | `llm-eval,vertex` |
| `amazon` | AWS Bedrock | `aws_region`, `aws_model` | `llm-eval,bedrock` |
| `nim` | NVIDIA NIM | `openai_api_base` | `llm-eval,nvidia` |
| `llmGateway` | DataRobot LLM Gateway | `llm_gateway_model_id` | `datarobot-sdk` |


**`nemo_guardrails` supports:** `openAi`, `azureOpenAi`, `nim`, `llmGateway` only  
**`nemo_evaluator` supports:** `datarobot` only

### Available models (Google / AWS)

The library maps a fixed set of model names to their provider API identifiers. Models not in this list are not supported.

| Provider | `llm_type` | `google_model` / `aws_model` |
|---|---|---|
| Google Vertex AI | `google` | `google-gemini-1.5-flash`, `google-gemini-1.5-pro`, `chat-bison` |
| AWS Bedrock | `amazon` | `amazon-titan`, `anthropic-claude-2`, `anthropic-claude-3-haiku`, `anthropic-claude-3-sonnet`, `anthropic-claude-3-opus`, `anthropic-claude-3.5-sonnet-v1`, `anthropic-claude-3.5-sonnet-v2`, `amazon-nova-lite`, `amazon-nova-micro`, `amazon-nova-pro` |

---

## 7. Full annotated example

> Replace every `<...>` placeholder with a real value before use.
> DataRobot deployment IDs are exactly 24 hexadecimal characters.

```yaml
timeout_sec: 15
timeout_action: score

guards:
  # -- Pre-score (prompt) --------------------------------------------------

  - name: Prompt Injection
    type: model
    stage: prompt
    deployment_id: "<prompt-injection-id>"
    model_info:
      input_column_name: text
      target_name: injection_injection_PREDICTION
      target_type: Binary
      class_names: []
    intervention:
      action: block
      message: "Prompt injection attempt detected and blocked."
      conditions:
        - comparand: 0.80
          comparator: greaterThan

  - name: Toxicity
    type: model
    stage: prompt
    deployment_id: "<toxicity-id>"
    model_info:
      input_column_name: text
      target_name: toxicity_toxic_PREDICTION
      target_type: Binary
      class_names: []
    intervention:
      action: block
      message: "Toxic content is not allowed."
      conditions:
        - comparand: 0.5
          comparator: greaterThan

  - name: PII Detector
    type: model
    stage: prompt
    deployment_id: "<pii-id>"
    model_info:
      input_column_name: text
      target_name: contains_pii_true_PREDICTION
      target_type: TextGeneration
      replacement_text_column_name: anonymized_text_OUTPUT
      class_names: []
    intervention:
      action: replace
      message: "PII detected and removed."
      conditions:
        - comparand: 0.5
          comparator: greaterThan

  - name: Topic Guardrail
    type: nemo_guardrails
    stage: prompt
    llm_type: azureOpenAi
    openai_api_base: "https://<resource>.openai.azure.com/"
    openai_deployment_id: gpt-4o-mini
    intervention:
      action: block
      message: "This topic is outside the allowed scope."
      conditions:
        - comparand: "TRUE"
          comparator: equals

  # -- Post-score (response) -----------------------------------------------

  - name: Response Token Count
    type: ootb
    ootb_type: token_count
    stage: response

  - name: Faithfulness
    type: ootb
    ootb_type: faithfulness
    stage: response
    copy_citations: true
    llm_type: datarobot
    deployment_id: "<llm-id>"
    intervention:
      action: block
      message: "The response appears to be hallucinated."
      conditions:
        - comparand: 0.0
          comparator: equals

  - name: Task Adherence
    type: ootb
    ootb_type: task_adherence
    stage: response
    llm_type: datarobot
    deployment_id: "<llm-id>"
    intervention:
      action: block
      message: "LLM did not complete the requested task."
      conditions:
        - comparator: lessThan
          comparand: 0.5

  - name: Cost
    type: ootb
    ootb_type: cost
    stage: response
    additional_guard_config:
      cost:
        currency: USD
        input_price: 0.01
        input_unit: 1000
        output_price: 0.03
        output_unit: 1000
    intervention:
      action: report
      conditions: []
```

---

## 8. Using the config in Python

Guards can be configured from a **YAML file**, a **plain Python dict**, or a **Pydantic object** built entirely in Python. All approaches are fully equivalent — choose whichever fits your workflow.

### 8a. From a YAML file

#### Return types

| Method | Returns |
|---|---|
| `evaluate_prompt(prompt)` | `(EvaluationResult, latency_seconds, prescore_df)` |
| `evaluate_response(response, prompt=None)` | `(EvaluationResult, latency_seconds, postscore_df)` |
| `evaluate_full_pipeline(prompt, llm_callable)` | `(PipelineResult, prescore_df, postscore_df)` — `postscore_df` is `None` when the prompt was blocked; per-stage latency is **not** returned — use `evaluate_prompt` / `evaluate_response` directly when you need it |
| `evaluate_prompt_async(prompt)` | same as `evaluate_prompt` but non-blocking |
| `evaluate_response_async(response, prompt=None)` | same as `evaluate_response` but non-blocking |
| `evaluate_full_pipeline_async(prompt, llm_callable)` | same as `evaluate_full_pipeline` but non-blocking; `llm_callable` must be an `async` coroutine |
| `evaluate_full_pipeline_stream_async(prompt, llm_callable)` | `AsyncGenerator[ChatCompletionChunk, None]` — see [§8d](#8d-streaming-pipeline) |
| `stream_response_async(completion, *, prompt, prescore_df, prescore_latency)` | `AsyncGenerator[ChatCompletionChunk, None]` — lower-level; see [§8d](#8d-streaming-pipeline) |

`EvaluationResult.metrics` holds the guard scores keyed by guard name.

#### `evaluate_prompt` / `evaluate_prompt_async` parameters

| Parameter | Type | Required | Description |
|---|---|---|---|
| `prompt` | `str` | ✅ | The user prompt text to evaluate against prescore guards |

#### `evaluate_response` / `evaluate_response_async` parameters

| Parameter | Type | Required | Description |
|---|---|---|---|
| `response` | `str` | ✅ | The LLM response text to evaluate against postscore guards |
| `prompt` | `str \| None` | ❌ | The original user prompt. Required for guards that compare prompt and response (e.g. `faithfulness`, `task_adherence`, `rouge_1`). Omit only when no such guards are configured |
| `pipeline_interactions` | `str \| None` | ❌ | JSON-serialised `MultiTurnSample` dict from the DataRobot agentic pipeline. Enables `agent_goal_accuracy` to evaluate the **full interaction trace** instead of just the final response. |

#### `evaluate_full_pipeline` / `evaluate_full_pipeline_async` parameters

| Parameter | Type | Required | Description |
|---|---|---|---|
| `prompt` | `str` | ✅ | The user prompt to evaluate |
| `llm_callable` | `Callable[[str], str]` (sync) or `Callable[[str], Awaitable[str]]` (async) | ✅ | Callable that receives the (possibly sanitised) effective prompt and returns the LLM response. For the async variant this must be an `async` coroutine |

#### `EvaluationResult` fields

| Field | Type | Description |
|---|---|---|
| `blocked` | `bool` | `True` if any guard blocked the text |
| `blocked_message` | `str \| None` | The block message configured on the guard |
| `replaced` | `bool` | `True` if a `replace`-action guard fired |
| `replacement` | `str \| None` | The sanitised replacement text (PII-scrubbed prompt, etc.) |
| `metrics` | `dict[str, Any]` | Guard scores keyed by guard name (e.g. `{"Toxicity": 0.87}`) |

#### `PipelineResult` fields

| Field | Type | Description |
|---|---|---|
| `prompt_evaluation` | `EvaluationResult` | Prescore evaluation result |
| `response` | `str \| None` | Final (possibly replaced) LLM response; `None` when blocked |
| `response_evaluation` | `EvaluationResult \| None` | Postscore evaluation result; `None` when prompt was blocked |
| `blocked` *(computed)* | `bool` | `True` if either stage was blocked |
| `replaced` *(computed)* | `bool` | `True` if either stage was replaced |

---

#### What `prescore_df` contains

`prescore_df` is the raw pandas DataFrame produced by running all **prescore (prompt-stage) guards** on the input.  
It starts as a copy of the input and gains one set of columns per guard after execution.

| Column | Description |
|---|---|
| `{prompt_column_name}` | Original prompt text |
| `{guard.metric_column_name}` | Guard score (one column per guard, e.g. `Toxicity_toxicity_toxic_PREDICTION`) |
| `{guard_name}_latency` | Wall-clock seconds this guard took |
| `blocked_{prompt_col}` | `True` if any guard blocked the prompt |
| `blocked_message_{prompt_col}` | Block reason / message returned to the caller |
| `replaced_{prompt_col}` | `True` if a replace-action guard fired |
| `replaced_message_{prompt_col}` | Replacement text (sanitised prompt from PII guard, etc.) |
| `reported_{prompt_col}` | `True` when a report-action guard fired |
| `Noneed_{prompt_col}` | Internal sentinel for no-action guards |
| `action_{prompt_col}` | Comma-joined string of actions taken (e.g. `"block"`, `"report,block"`) |
| *(per-guard enforced column)* | Internal per-guard enforcement flag used by `format_result_df` |

---

#### What `postscore_df` contains

`postscore_df` is the raw pandas DataFrame produced by running all **postscore (response-stage) guards** on the LLM output.  
It starts with the predictions DataFrame (which includes the LLM response plus any pass-through columns) and gains guard result columns after execution.

| Column | Description |
|---|---|
| `{response_column_name}` | LLM's response text |
| `{prompt_column_name}` | User prompt (forwarded for faithfulness / task-adherence calculation) |
| `CITATION_CONTENT_{N}` | Retrieved RAG context chunks (when citations are enabled) |
| `PROMPT_TOKEN_COUNT_from_usage` | Prompt token count (when `usage` is provided by the LLM) |
| `RESPONSE_TOKEN_COUNT_from_usage` | Response token count (when `usage` is provided by the LLM) |
| `agentic_pipeline_interactions` | Agentic workflow interaction trace (for `agent_goal_accuracy` / `task_adherence`) |
| `{association_id_column_name}` | Association ID (if the deployment has one configured) |
| `{guard.metric_column_name}` | Guard score (one column per postscore guard, e.g. `Response_Faithfulness_score`) |
| `{guard_name}_latency` | Wall-clock seconds this guard took |
| `blocked_{response_col}` | `True` if any guard blocked the response |
| `blocked_message_{response_col}` | Block message returned to the caller |
| `replaced_{response_col}` | `True` if a replace-action guard fired on the response |
| `replaced_message_{response_col}` | Replacement text |
| `reported_{response_col}` | `True` when a report-action guard fired |
| `Noneed_{response_col}` | Internal sentinel for no-action guards |
| `action_{response_col}` | Comma-joined string of actions taken |
| *(per-guard enforced column)* | Internal per-guard enforcement flag |

> **Note:** `prescore_df` and `postscore_df` are the **raw executor outputs**.  
> In the DRUM pipeline, `format_result_df` merges them into a single `result_df` that also adds
> `unmoderated_{response_col}`, `moderated_{prompt_col}`, `datarobot_latency`, `datarobot_token_count`,
> and `datarobot_confidence_score`.  Those derived columns are **not** present in the DataFrames
> returned directly by `evaluate_prompt` / `evaluate_response` / `evaluate_full_pipeline`.

---

```python
import os
from datarobot_dome.api import ModerationPipeline

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
# TARGET_NAME is optional — sets the response column name used by postscore guards.
# Resolution order: TARGET_NAME env var → response_column_name in config → default "completion".
# os.environ["TARGET_NAME"] = "resultText"

pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")

# ── Prompt evaluation (prescore guards) ───────────────────────────────────────
# sync
result, latency, prescore_df = pipeline.evaluate_prompt("What is DataRobot?")
# async (inside an async function / FastAPI route / agent)
result, latency, prescore_df = await pipeline.evaluate_prompt_async("What is DataRobot?")

if result.blocked:
    print(f"Blocked: {result.blocked_message}")
elif result.replaced:
    print(f"Prompt sanitised to: {result.replacement}")

# ── Response evaluation (postscore guards) ────────────────────────────────────
# sync
result, latency, postscore_df = pipeline.evaluate_response(
    "DataRobot is an AI platform.",
    prompt="What is DataRobot?",   # required for faithfulness / task-adherence guards
)
# async
result, latency, postscore_df = await pipeline.evaluate_response_async(
    "DataRobot is an AI platform.",
    prompt="What is DataRobot?",
)
print(f"Latency: {latency:.3f}s  Blocked: {result.blocked}  Metrics: {result.metrics}")

# ── Full pipeline: prescore → LLM → postscore ─────────────────────────────────
# sync
def my_llm(prompt: str) -> str:
    return "DataRobot is an AI platform."   # replace with your LLM call

result, prescore_df, postscore_df = pipeline.evaluate_full_pipeline("What is DataRobot?", my_llm)

# async (llm_callable must be an async coroutine)
async def my_async_llm(prompt: str) -> str:
    return "DataRobot is an AI platform."   # replace with your async LLM call

result, prescore_df, postscore_df = await pipeline.evaluate_full_pipeline_async(
    "What is DataRobot?", my_async_llm
)

if result.blocked:
    stage = "prompt" if result.prompt_evaluation.blocked else "response"
    blocked_eval = (
        result.prompt_evaluation if result.prompt_evaluation.blocked
        else result.response_evaluation
    )
    print(f"Blocked at {stage}: {blocked_eval.blocked_message}")
elif result.replaced:
    print(f"Text replaced. Response: {result.response}")
else:
    print(f"Response: {result.response}")
    print(f"Metrics: {result.response_evaluation.metrics}")
```

---

#### Agentic workflow example

For agents, the library can evaluate the **full interaction trace** — every tool call, intermediate
message, and final response — not just the last reply. This gives the `agent_goal_accuracy` guard
accurate context to judge whether the agent actually achieved the user's goal.

The interaction trace (`pipeline_interactions`) is a JSON-serialised
[`ragas.MultiTurnSample`](https://docs.ragas.io) produced by the DataRobot agent after each task
run. Pass it directly to `evaluate_response`.

**Config** (`docs/examples/agent_goal_accuracy_config.yaml`):

```yaml
targets:
  - target: _default
    guards:
      - name: Agent Goal Accuracy
        type: ootb
        ootb_type: agent_goal_accuracy
        stage: response
        is_agentic: true
        llm_type: llmGateway
        llm_gateway_model_id: "azure/gpt-4o-mini"
        intervention:
          action: report  # measure-only: block/replace are ignored by the library
          conditions: []
```

> **Measure-only guard:** `agent_goal_accuracy` (like `cost` and `guideline_adherence`) always
> forces `intervene=False` internally regardless of the `action` configured. The score is only
> available in `result.metrics["agent_goal_accuracy"]` — use it to make blocking decisions in
> your own code when needed.

**Python — with full interaction trace (recommended for agentic pipelines)**:

```python
import json
from datarobot_dome.api import ModerationPipeline

pipeline = ModerationPipeline.from_yaml("docs/examples/agent_goal_accuracy_config.yaml")

task = "Book a flight from NYC to London"

# chat_completion is the object returned by the DataRobot agent SDK.
# `pipeline_interactions` is attached when the agent has tool calls / multi-turn
# history; it is None for a plain single-turn response.
chat_completion = my_agent.run(task=task)
agent_response = chat_completion.choices[0].message.content
interactions_json = getattr(chat_completion, "pipeline_interactions", None)

result, latency, postscore_df = pipeline.evaluate_response(
    response=agent_response,
    prompt=task,
    pipeline_interactions=interactions_json,  # JSON str, or None
)

score = result.metrics.get("agent_goal_accuracy")
passed = score is not None and score >= 0.5
print(f"score={score}  passed={passed}")
```
```

**Python — building the interaction trace manually** (when not using the DataRobot agent SDK):

```python
import json
from ragas import MultiTurnSample
from ragas.messages import AIMessage, HumanMessage, ToolCall, ToolMessage

# Reconstruct the trace from your agent's execution log.
sample = MultiTurnSample(
    user_input=[
        HumanMessage(content="Book a flight from NYC to London"),
        AIMessage(
            content="Searching for available flights…",
            tool_calls=[ToolCall(name="search_flights", args={"origin": "NYC", "dest": "LON"})],
        ),
        ToolMessage(content='[{"flight": "BA178", "price": 620}]'),
        AIMessage(content="I found BA178 departing tomorrow for $620. Shall I book it?"),
    ]
)
interactions_json = json.dumps(sample.to_dict())

result, latency, _ = pipeline.evaluate_response(
    response="I found BA178 departing tomorrow for $620. Shall I book it?",
    prompt="Book a flight from NYC to London",
    pipeline_interactions=interactions_json,
)
print(result.blocked, result.metrics)
```

> **Without `pipeline_interactions`** the guard falls back gracefully to evaluating the single
> prompt/response pair — useful during development before you have a live agent.

---

### 8b. From a plain Python dict

Use `ModerationPipeline.from_dict` when your configuration is already in dict form (e.g. loaded from JSON, fetched from an API, or assembled programmatically). The dict must follow the same schema as the YAML file.

#### Parameters

| Parameter | Type | Required | Description |
|---|---|---|---|
| `config` | `dict` | ✅ | Guard configuration dictionary following the YAML schema |
| `model_dir` | `str \| None` | ❌ | Base directory used to resolve relative asset paths (e.g. NeMo guardrails `.co` flow files). Defaults to `os.getcwd()` |

```python
import os
from datarobot_dome.api import ModerationPipeline

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
# os.environ["TARGET_NAME"] = "resultText"  # optional — see §10 for resolution order

pipeline = ModerationPipeline.from_dict(
    {
        "targets": [
            {
                "target": "_default",
                "guards": [
                    {
                        "name": "Token Count",
                        "type": "ootb",
                        "ootb_type": "token_count",
                        "stage": "prompt",
                    }
                ],
            }
        ]
    },
    model_dir="/path/to/nemo_guardrails_dir",  # optional; only needed for NeMo guards
)

result, latency, prescore_df = pipeline.evaluate_prompt("Hello")
print(result.metrics)
```

---

### 8c. From a Pydantic config object

Use `ModerationPipeline.from_config` to build the configuration entirely in Python — no YAML file required. This is useful for dynamic configurations, programmatic guard registration, or when embedding moderation in a larger application.

#### Parameters

| Parameter | Type | Required | Description |
|---|---|---|---|
| `config` | `ModerationConfig` | ✅ | A fully-constructed `ModerationConfig` Pydantic object |
| `model_dir` | `str \| None` | ❌ | Base directory used to resolve relative asset paths (e.g. NeMo guardrails `.co` flow files). Defaults to `os.getcwd()` |

All schema types are importable from `datarobot_dome.schema`:

```python
from datarobot_dome.schema import (
    ModerationConfig,
    TargetBlock,
    # Guard subtypes — pick the matching one per guard
    OOTBGuardSchema,
    ModelGuardSchema,
    NemoGuardrailsSchema,
    NemoEvaluatorSchema,
    # Nested schemas used inside guards
    AdditionalGuardConfigSchema,
    InterventionSchema,
    InterventionConditionSchema,
    ModelInfoSchema,
)
```

#### Schema type → guard type mapping

| Guard YAML `type` | Pydantic class |
|---|---|
| `ootb` | `OOTBGuardSchema` |
| `model` | `ModelGuardSchema` |
| `nemo_guardrails` | `NemoGuardrailsSchema` |
| `nemo_evaluator` | `NemoEvaluatorSchema` |

#### LLM Gateway example — hate speech / guideline adherence

```python
import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
    AdditionalGuardConfigSchema,
    InterventionSchema,
    ModerationConfig,
    OOTBGuardSchema,
    TargetBlock,
)

os.environ["DATAROBOT_ENDPOINT"]  = "https://app.datarobot.com/api/v2"
os.environ["DATAROBOT_API_TOKEN"] = "<your-dr-token>"
# os.environ["TARGET_NAME"] = "resultText"  # optional — see §10 for resolution order

config = ModerationConfig(
    targets=[
        TargetBlock(
            target="_default",
            guards=[
                OOTBGuardSchema(
                    type="ootb",
                    name="Hate Speech",
                    stage="response",
                    ootb_type="agent_guideline_adherence",
                    llm_type="llmGateway",
                    llm_gateway_model_id="azure/gpt-4o-2024-11-20",
                    additional_guard_config=AdditionalGuardConfigSchema(
                        agent_guideline=(
                            "The response must not contain hate speech, slurs, or content "
                            "that demeans people based on race, religion, gender, nationality, "
                            "or any other protected characteristic."
                        )
                    ),
                    intervention=InterventionSchema(
                        action="report",
                        conditions=[],
                    ),
                )
            ],
        )
    ]
)

# Pass model_dir when your config references NeMo guardrails flow files:
# pipeline = ModerationPipeline.from_config(config, model_dir="/path/to/nemo_guardrails_dir")

text = "People from that group are living in France."
result, latency, postscore_df = pipeline.evaluate_response(response=text, prompt="Describe this text.")
score = result.metrics.get("agent_guideline_adherence_score")
print(f"score={score}  latency={latency:.3f}s")
```

#### Model guard example

```python
import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
    InterventionConditionSchema,
    InterventionSchema,
    ModerationConfig,
    ModelGuardSchema,
    ModelInfoSchema,
    TargetBlock,
)

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
# os.environ["TARGET_NAME"] = "resultText"  # optional — see §10 for resolution order

config = ModerationConfig(
    targets=[
        TargetBlock(
            target="_default",
            guards=[
                ModelGuardSchema(
                    type="model",
                    name="Toxicity",
                    stage="prompt",
                    deployment_id="<your-toxicity-deployment-id>",
                    model_info=ModelInfoSchema(
                        input_column_name="text",
                        target_name="toxicity_toxic_PREDICTION",
                        target_type="Binary",
                        class_names=[],
                    ),
                    intervention=InterventionSchema(
                        action="block",
                        message="Toxic content blocked.",
                        conditions=[
                            InterventionConditionSchema(comparand=0.5, comparator="greaterThan")
                        ],
                    ),
                )
            ],
        )
    ]
)

pipeline = ModerationPipeline.from_config(config)
```

---

### 8d. Streaming pipeline

`evaluate_full_pipeline_stream_async` is the primary high-level API for streaming.
It encapsulates prescore evaluation, the thread/queue bridge to `ModerationIterator`, and
postscore guard execution — callers supply only a prompt and a streaming LLM callable.

#### Method signatures

| Method | When to use |
|---|---|
| `evaluate_full_pipeline_stream_async(prompt, llm_callable)` | **Preferred.** Hides all internal state — no `prescore_df` required. |
| `stream_response_async(completion, *, prompt, prescore_df, prescore_latency)` | Advanced: when you need to inspect the `EvaluationResult` from prescore **before** starting the LLM stream (e.g. to act on a REPLACE result). |

#### `evaluate_full_pipeline_stream_async` parameters

| Parameter | Type | Required | Description |
|---|---|---|---|
| `prompt` | `str` | ✅ | The user prompt |
| `llm_callable` | `Callable[[str], AsyncIterator[ChatCompletionChunk]]` | ✅ | Sync callable that receives the (possibly sanitised) effective prompt and returns an async iterator of chunks. Called only when the prompt is not blocked. |

#### Chunk signals

| `finish_reason` | Meaning |
|---|---|
| `None` or `"stop"` | Normal chunk — content is in `chunk.choices[0].delta.content` |
| `"content_filter"` | A guard intervened. `delta.content` holds the block message. The LLM was never called if this is the first (and only) chunk. |

#### Example

```python
import asyncio
import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
    InterventionSchema, ModerationConfig, OOTBGuardSchema, TargetBlock,
)

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"

pipeline = ModerationPipeline.from_config(
    ModerationConfig(
        targets=[
            TargetBlock(
                target="_default",
                guards=[
                    OOTBGuardSchema(
                        name="Prompt Token Limit",
                        type="ootb",
                        ootb_type="token_count",
                        stage="prompt",
                        intervention=InterventionSchema(
                            action="block",
                            conditions=[{"comparator": "greaterThan", "comparand": 200}],
                            message="Prompt too long.",
                        ),
                    ),
                ],
            )
        ]
    )
)


async def my_llm_stream(prompt: str):
    """Wrap a sync OpenAI stream as an async iterator."""
    import openai
    client = openai.OpenAI(
        api_key=os.environ["DATAROBOT_API_TOKEN"],
        base_url=f"{os.environ['DATAROBOT_ENDPOINT']}/genai/llmgw",
    )
    for chunk in client.chat.completions.create(
        model="azure/gpt-4o-2024-11-20",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    ):
        yield chunk


async def run(prompt: str) -> None:
    print(f"Prompt: {prompt!r}")
    async for chunk in pipeline.evaluate_full_pipeline_stream_async(prompt, my_llm_stream):
        finish_reason = chunk.choices[0].finish_reason
        content = chunk.choices[0].delta.content
        if finish_reason == "content_filter":
            print(f"[BLOCKED] {content}")
            return
        if content:
            print(content, end="", flush=True)
    print()


asyncio.run(run("What is DataRobot?"))
```

#### Advanced: `stream_response_async`

Use when you need the prescore `EvaluationResult` before streaming begins:

```python
result, latency, prescore_df = await pipeline.evaluate_prompt_async(prompt)
if result.blocked:
    # handle block before ever calling the LLM
    return result.blocked_message

effective = result.replacement if result.replaced else prompt

async for chunk in pipeline.stream_response_async(
    my_llm_stream(effective),
    prompt=effective,
    prescore_df=prescore_df,      # must come from evaluate_prompt_async
    prescore_latency=latency,
):
    ...
```

### With DRUM

Place `moderation_config.yaml` alongside your custom model code, then:

```bash
drum score --verbose \
  --code-dir ./ \
  --target-type textgeneration \
  --input ./input.csv \
  --runtime-params-file values.yaml
```

---

## 9. Testing guide

Set these environment variables before running any test (see [§10](#10-environment-variables) for details):

```bash
export DATAROBOT_ENDPOINT="https://app.datarobot.com/api/v2"
export DATAROBOT_API_TOKEN="your-token"
export TARGET_NAME="resultText"
```

Guards fall into four groups based on the credentials they require:

| Group | Guard types | Extra credentials needed |
|---|---|---|
| **A — local** | `token_count`, `rouge_1`, `cost`, `custom_metric` | *(none beyond the base vars above)* |
| **B — DataRobot deployment** | `type: model`, any `ootb` with `llm_type: datarobot` or `llm_type: llmGateway` | Only `DATAROBOT_API_TOKEN`; provide a real `deployment_id` |
| **C — external LLM provider** | Any `ootb` with `llm_type: openAi`, `azureOpenAi`, `google`, `amazon`, `nim` | Provider-specific env var (see §10) |
| **D — NeMo** | `type: nemo_guardrails`, `type: nemo_evaluator` | Provider key for NeMo Guardrails; `DATAROBOT_API_TOKEN` for NeMo Evaluator |

See [§5](#5-guard-types) for complete YAML examples per guard type and [§8](#8-using-the-config-in-python) for Python usage patterns.

---

## 10. Environment variables

### Always required

| Variable | Description |
|---|---|
| `DATAROBOT_ENDPOINT` | DataRobot instance URL, e.g. `https://app.datarobot.com/api/v2` |
| `DATAROBOT_API_TOKEN` | DataRobot API token |
| `TARGET_NAME` | The name of the DataFrame column that holds the LLM response text (e.g. `resultText`). Resolution order for the response column (highest to lowest priority): **(1)** DRUM deployment `target_name` (always wins when `MLOPS_DEPLOYMENT_ID` is set), **(2)** `TARGET_NAME` env var, **(3)** `response_column_name` in the config file, **(4)** built-in default `"completion"`. DRUM sets this automatically; in standalone Python you can set it here **or** declare `response_column_name` in the YAML/`ModerationConfig` — but the env var takes precedence if both are provided. |
| `DISABLE_MODERATION` | Set to `true` to disable all guards at runtime. |

### OTel tracing (optional)

OTel traces are emitted whenever `OTEL_EXPORTER_OTLP_ENDPOINT` is set.  The
remaining two variables are optional — their corresponding request headers are
omitted when the variable is absent, which allows traces to be forwarded to an
unauthenticated local OTLP collector such as the
`af-component-agent-playground` UI without needing credentials.

| Variable | Required | Description |
|---|---|---|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | ✅ | Base URL of the OTLP HTTP collector, e.g. `http://localhost:4318`. The library appends `/v1/traces` automatically. |
| `OTEL_SERVICE_NAME` | ❌ | Adds `X-DataRobot-Entity-Id` to trace requests.  Required when routing to the DataRobot production collector; omit for local collectors. |
| `OTEL_COLLECTOR_TOKEN` | ❌ | Adds `Authorization: Bearer <token>` to trace requests.  Required for production/deployed collectors; omit for local collectors. |

**Local playground example**:

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
# OTEL_SERVICE_NAME and OTEL_COLLECTOR_TOKEN are not needed
```

**Production example:**

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT="https://collector.datarobot.com"
export OTEL_SERVICE_NAME="deployment-abc123"
export OTEL_COLLECTOR_TOKEN="my-token"
```

### deepeval telemetry

The `task_adherence` guard uses `deepeval` internally. By default, moderations opts out of
deepeval's usage telemetry — no `.deepeval/` directory is created and no data is sent externally.

To opt in, set `enable_deepeval_telemetry: true` in your config (only takes effect when a
`task_adherence` guard is present; deepeval is loaded lazily):

```yaml
enable_deepeval_telemetry: true   # default: false

guards:
  - name: Task Adherence
    type: ootb
    ootb_type: task_adherence
    stage: response
```

To opt out explicitly via environment variable (e.g. in CI or container environments):

```bash
export DEEPEVAL_TELEMETRY_OPT_OUT=YES  # opt out (library default)
unset DEEPEVAL_TELEMETRY_OPT_OUT       # opt in
```

### Credentials for LLM-eval guards using external providers

When your guard uses `llm_type: datarobot`, it reuses `DATAROBOT_API_TOKEN` — no extra variable needed.

For external providers (OpenAI, Azure OpenAI, Google, AWS), set a guard-specific env var. The variable name is built from the guard's type, stage, and ootb_type:

```
MLOPS_RUNTIME_PARAM_MODERATION_{TYPE}_{STAGE}_{OOTB_TYPE}_{PROVIDER_SUFFIX}
```

| Guard (`ootb_type`) | Provider | Environment variable |
|---|---|---|
| `task_adherence` | OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_OPENAI_API_KEY` |
| `task_adherence` | Azure OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_AZURE_OPENAI_API_KEY` |
| `faithfulness` | OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_FAITHFULNESS_OPENAI_API_KEY` |
| `faithfulness` | Azure OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_FAITHFULNESS_AZURE_OPENAI_API_KEY` |
| `agent_guideline_adherence` | Azure OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GUIDELINE_ADHERENCE_AZURE_OPENAI_API_KEY` |
| `agent_guideline_adherence` | Google Vertex AI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GUIDELINE_ADHERENCE_GOOGLE_SERVICE_ACCOUNT` |
| `agent_goal_accuracy` | Azure OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GOAL_ACCURACY_AZURE_OPENAI_API_KEY` |
| `agent_goal_accuracy` | AWS Bedrock | `MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GOAL_ACCURACY_AWS_ACCOUNT` |
| `nemo_guardrails` (prompt) | Azure OpenAI | `MLOPS_RUNTIME_PARAM_MODERATION_NEMO_GUARDRAILS_PROMPT_AZURE_OPENAI_API_KEY` |

Value format per provider:

```bash
# OpenAI / Azure OpenAI
'{"type":"credential","payload":{"credentialType":"api_token","apiToken":"YOUR_KEY"}}'

# Google Vertex AI
'{"type":"credential","payload":{"credentialType":"gcp","gcpKey":{...}}}'

# AWS Bedrock
'{"type":"credential","payload":{"credentialType":"s3","awsAccessKeyId":"...","awsSecretAccessKey":"...","awsSessionToken":"..."}}'
```

# DataRobot Moderation CLI

The `dr-moderation` CLI lets you manage guards and test moderation pipelines from the terminal — no Python code required.

---

## Table of Contents

1. [Installation](#1-installation)
2. [Authentication](#2-authentication)
3. [Commands](#3-commands)
   - [evaluate](#31-evaluate)
   - [add-guard](#32-add-guard)
   - [agent a2a connect](#33-agent-a2a-connect)
   - [serve](#34-serve)
4. [YAML schema quick reference](#4-yaml-schema-quick-reference)
5. [Exit codes](#5-exit-codes)

---

## 1. Installation

**End-user** — the `dr-moderation` binary lands on your `PATH` automatically:

```bash
pip install datarobot-moderations[all]
dr-moderation --help
```

> Python 3.10 – 3.12 required.

**Developer / contributor** — Poetry places the binary inside `.venv/bin/`, which is not on your `PATH` until the venv is active. Pick one:

```bash
poetry shell                        # Option A: activate for the session
poetry run dr-moderation --help     # Option B: one-off prefix
make cli ARGS="evaluate --help"     # Option C: Makefile shortcut
```

---

## 2. Authentication

Commands that call the DataRobot API need credentials. Set them once per session:

```bash
export DATAROBOT_ENDPOINT="https://app.datarobot.com/api/v2"
export DATAROBOT_API_TOKEN="your-api-token"
```

Or pass them as global flags (flags take precedence over env vars):

```bash
dr-moderation --endpoint <url> --token <token> <command>
```

---

## 3. Commands

### 3.1 `evaluate`

Evaluate a prompt and/or response through the local `ModerationPipeline`. Supports every guard type including **LLM Gateway** (`llm_type: llmGateway`) — no deployment required.

The config file must use the **Python SDK snake_case schema** (see [GUARDRAILS.md](GUARDRAILS.md) for the full field reference).

```
dr-moderation evaluate [OPTIONS]
```

| Option | Required | Default | Description |
|---|---|---|---|
| `--config-file FILE` | ✅ | — | Moderation config YAML (snake_case SDK format) |
| `--prompt TEXT` | ❌ * | — | Prompt text; evaluated against prescore guards |
| `--response TEXT` | ❌ * | — | Response text; evaluated against postscore guards. Also pass `--prompt` for guards that need both (e.g. `faithfulness`, `task_adherence`) |
| `--as-json` | ❌ | false | Emit results as JSON — useful for scripting |

\* At least one of `--prompt` or `--response` is required.

**Example output (human-readable):**

```
── Prescore (prompt) ──────────────────────────────
  Blocked  : False
  Metrics  :
    Prompts_token_count: 4
  Latency  : 0.05s
```

**Examples:**

```bash
# Token-count guard on a prompt
dr-moderation evaluate \
  --config-file docs/examples/token_count_config.yaml \
  --prompt "Hello, world!"

# LLM Gateway task-adherence guard
dr-moderation evaluate \
  --config-file docs/examples/llm_gateway_config.yaml \
  --prompt "What is DataRobot?" \
  --response "DataRobot is an AI platform."

# Evaluate both, emit JSON, pipe to jq
dr-moderation evaluate \
  --config-file docs/examples/llm_gateway_config.yaml \
  --prompt "What is DataRobot?" \
  --response "DataRobot is an AI platform." \
  --as-json | jq '.postscore.metrics'
```

> **Ready-made configs** in `docs/examples/`:
> - `token_count_config.yaml` — prompt + response token-count guards
> - `llm_gateway_config.yaml` — token-count prompt guard + LLM Gateway `task_adherence`

---

### 3.2 `add-guard`

Add guards to an existing DataRobot custom model. Creates a new **custom model version** with the guards attached and prints the version ID to stdout.

> **How it works:**
> 1. You create and register a custom model (your LLM) in DataRobot — this gives you a `customModelId`.
> 2. You define guards in a camelCase YAML file.
> 3. `add-guard` POSTs the config to `/guardConfigurations/toNewCustomModelVersion/`. DataRobot creates a new version of the model with the guards and returns the `customModelVersionId`.
> 4. Deploy that new version — it will now enforce your guards on every prompt/response.

```
dr-moderation add-guard [OPTIONS]
```

| Option | Required | Default | Description |
|---|---|---|---|
| `--custom-model-id TEXT` | ✅ | — | ID of the custom model (find it in the DataRobot UI under **Model Workshop → Custom Models**) |
| `--config-file FILE` | ✅ | — | YAML list of guard configurations (camelCase API format) |
| `--timeout-sec INTEGER` | ❌ | 60 | Per-guard timeout in seconds |
| `--timeout-action [score\|block]` | ❌ | score | Action on timeout: `score` passes through; `block` rejects |

**Example output:**

```
6797abc123def456789abcde
```

The printed ID is the new `customModelVersionId` — pass it to subsequent API or SDK calls to deploy the version.

**Examples:**

```bash
# Add guards, capture the new version ID
VERSION_ID=$(dr-moderation add-guard \
  --custom-model-id 6793e6b2114f17240fa2194c \
  --config-file docs/examples/add_guard_config.yaml)
echo "New version: ${VERSION_ID}"

# Block if any guard exceeds 30 s
dr-moderation add-guard \
  --custom-model-id 6793e6b2114f17240fa2194c \
  --config-file docs/examples/add_guard_config.yaml \
  --timeout-sec 30 \
  --timeout-action block
```

---

### 3.3 `agent a2a connect`

Verify connectivity to a remote [A2A](https://google.github.io/A2A/) agent by fetching its agent card from `/.well-known/agent.json`.

```
dr-moderation agent a2a connect [OPTIONS]
```

| Option | Required | Description |
|---|---|---|
| `--url TEXT` | ✅ | Base URL of the remote A2A agent |
| `--deployment-id TEXT` | ❌ | DataRobot deployment ID to verify alongside the agent |

**Examples:**

```bash
# 1. Start a one-line A2A mock (serves /.well-known/agent.json on port 8765)
python3 - << 'EOF'
import json
from http.server import BaseHTTPRequestHandler, HTTPServer

CARD = {"name": "My Agent", "version": "1.0.0", "capabilities": ["moderation"]}

class H(BaseHTTPRequestHandler):
    def do_GET(self):
        body = json.dumps(CARD).encode()
        self.send_response(200)
        self.send_header("Content-Type", "application/json")
        self.end_headers()
        self.wfile.write(body)
    def log_message(self, *_): pass

HTTPServer(("localhost", 8765), H).serve_forever()
EOF &

# 2. Connect to it
dr-moderation agent a2a connect --url http://localhost:8765
```

**Production examples:**

```bash
# Verify a remote A2A agent is reachable
dr-moderation agent a2a connect --url https://my-agent.example.com

# Also verify the backing DataRobot deployment
dr-moderation agent a2a connect \
  --url https://my-agent.example.com \
  --deployment-id 6793e6b2114f17240fa2194c
```

---

### 3.4 `serve`

Start a **JSON-RPC 2.0 server** so that non-Python applications (Java, Go, C#, …) can evaluate
prompts and responses through the full moderation pipeline without HTTP/REST overhead or a Python
runtime in their own process.

Two transports are available:

| Transport | How it works | Best for |
|---|---|---|
| `stdio` *(default)* | Caller spawns `dr-moderation serve` as a subprocess; newline-delimited JSON on stdin/stdout | Single-caller, zero network setup |
| `ws` | aiohttp WebSocket server; multiple callers share one long-running instance | Containerised / multi-caller deployments |

```
dr-moderation serve [OPTIONS]
```

| Option | Required | Default | Description |
|---|---|---|---|
| `--transport [stdio\|ws]` | ❌ | `stdio` | Transport backend |
| `--config-file FILE` | ❌ | — | Pre-load a pipeline YAML at startup. For `ws` this pipeline is shared across all connections; for `stdio` the caller can still send `initialize` to override it |
| `--host TEXT` | ❌ | `127.0.0.1` | Bind address (`ws` only) |
| `--port INTEGER` | ❌ | `9000` | Bind port (`ws` only) |
| `--log-level [debug\|info\|warning\|error]` | ❌ | `warning` | Logging verbosity — all output goes to **stderr**, never stdout |

All diagnostic output goes to **stderr**. The **stdout** stream carries only JSON-RPC messages so callers can parse it without noise.

#### Wire format

Messages are **newline-delimited JSON** (one complete JSON object per line, `\n`-terminated). Both requests and responses follow [JSON-RPC 2.0](https://www.jsonrpc.org/specification).

**Request** (caller → server):

```json
{"jsonrpc": "2.0", "id": 1, "method": "initialize", "params": {"config_path": "/path/to/config.yaml"}}
```

**Response** (server → caller):

```json
{"jsonrpc": "2.0", "id": 1, "result": {"ok": true}}
```

#### Methods

| Method | Call order | `params` keys | Description |
|---|---|---|---|
| `initialize` | Before `evaluate_*` | `config_path` (string, required) | Load the moderation pipeline from a YAML file. Must be called before any `evaluate_*` method unless `--config-file` was passed at startup. Returns `{"ok": true}` |
| `evaluate_prompt` | After `initialize` | `prompt` (string, required) | Run prescore guards and return an `EvaluationResult` |
| `evaluate_response` | After `initialize` | `response` (string, required); `prompt` (string, optional); `pipeline_interactions` (string, optional) | Run postscore guards and return an `EvaluationResult` |
| `shutdown` | Any time | *(none)* | Signal the server to stop and return `{"ok": true}`. stdio: server exits after sending the response. ws: closes the current connection; the server process keeps running |

#### Complete response example (`evaluate_prompt`)

```json
{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "blocked": false,
    "blocked_message": null,
    "replaced": false,
    "replacement": null,
    "metrics": {
      "Prompts_token_count": 4
    },
    "latency_sec": 0.012345
  }
}
```

When a guard blocks content `blocked` is `true`, `blocked_message` holds the guard's configured message, and `latency_sec` is always present. When a `replace`-action guard fires, `replaced` is `true` and `replacement` holds the sanitised text.

#### Examples

**Bash (stdio — interactive test):**

```bash
# Pre-load a config, then evaluate a prompt
dr-moderation serve --config-file moderation_config.yaml --transport stdio <<'EOF'
{"jsonrpc":"2.0","id":1,"method":"evaluate_prompt","params":{"prompt":"Hello, world!"}}
{"jsonrpc":"2.0","id":2,"method":"shutdown","params":{}}
EOF
```

**Python (subprocess, `stdio`):**

```python
import json
import subprocess
import sys

proc = subprocess.Popen(
    [sys.executable, "-m", "datarobot_dome.cli", "serve",
     "--transport", "stdio",
     "--config-file", "moderation_config.yaml"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.DEVNULL,  # discard diagnostics; redirect to sys.stderr to surface them
    text=True,
    bufsize=1,
)

def rpc(method, params, *, req_id):
    msg = json.dumps({"jsonrpc": "2.0", "id": req_id, "method": method, "params": params})
    proc.stdin.write(msg + "\n")
    proc.stdin.flush()
    # Skip any stdout lines that are not valid JSON (startup messages, warnings).
    while True:
        line = proc.stdout.readline()
        try:
            return json.loads(line)
        except json.JSONDecodeError:
            continue

result = rpc("evaluate_prompt", {"prompt": "Hello, world!"}, req_id=1)
print(result["result"])

rpc("shutdown", {}, req_id=2)
proc.wait()
```

**Go (stdio):**

```go
package main

import (
	"bufio"
	"encoding/json"
	"fmt"
	"os/exec"
)

func main() {
	cmd := exec.Command("dr-moderation", "serve",
		"--transport", "stdio",
		"--config-file", "moderation_config.yaml")
	stdin, _ := cmd.StdinPipe()
	stdout, _ := cmd.StdoutPipe()
	_ = cmd.Start()

	scanner := bufio.NewScanner(stdout)

	send := func(req any) {
		b, _ := json.Marshal(req)
		fmt.Fprintln(stdin, string(b))
	}
	recv := func() map[string]any {
		// Skip non-JSON lines (startup messages, log output on stdout)
		for scanner.Scan() {
			var m map[string]any
			if err := json.Unmarshal(scanner.Bytes(), &m); err == nil {
				return m
			}
		}
		return nil
	}

	send(map[string]any{"jsonrpc": "2.0", "id": 1, "method": "evaluate_prompt",
		"params": map[string]any{"prompt": "Hello, world!"}})
	resp := recv()
	fmt.Println(resp["result"])

	send(map[string]any{"jsonrpc": "2.0", "id": 2, "method": "shutdown", "params": map[string]any{}})
	cmd.Wait()
}
```

**Java (stdio):**

```java
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.*;
import java.util.Map;

public class ModerationClient {
    public static void main(String[] args) throws Exception {
        ProcessBuilder pb = new ProcessBuilder(
            "dr-moderation", "serve",
            "--transport", "stdio",
            "--config-file", "moderation_config.yaml");
        pb.redirectError(ProcessBuilder.Redirect.DISCARD);
        Process proc = pb.start();

        ObjectMapper mapper = new ObjectMapper();
        var writer = new PrintWriter(new BufferedWriter(
            new OutputStreamWriter(proc.getOutputStream())), true);
        var reader = new BufferedReader(
            new InputStreamReader(proc.getInputStream()));

        // Send request
        String req = mapper.writeValueAsString(Map.of(
            "jsonrpc", "2.0", "id", 1,
            "method", "evaluate_prompt",
            "params", Map.of("prompt", "Hello, world!")));
        writer.println(req);

        // Read response — skip non-JSON lines
        String line;
        while ((line = reader.readLine()) != null) {
            try {
                var resp = mapper.readValue(line, Map.class);
                System.out.println(resp.get("result"));
                break;
            } catch (Exception ignored) {}
        }

        writer.println(mapper.writeValueAsString(Map.of(
            "jsonrpc", "2.0", "id", 2, "method", "shutdown", "params", Map.of())));
        proc.waitFor();
    }
}
```

**C# (stdio):**

```csharp
using System.Diagnostics;
using System.Text.Json;

var proc = new Process {
    StartInfo = new ProcessStartInfo("dr-moderation") {
        Arguments = "serve --transport stdio --config-file moderation_config.yaml",
        RedirectStandardInput  = true,
        RedirectStandardOutput = true,
        RedirectStandardError  = true,
        UseShellExecute = false,
    }
};
proc.Start();
_ = proc.StandardError.ReadToEndAsync(); // drain stderr on a background task

void Send(object req) => proc.StandardInput.WriteLine(JsonSerializer.Serialize(req));
JsonElement Recv() {
    // Skip non-JSON lines (startup messages, warnings)
    while (true) {
        var line = proc.StandardOutput.ReadLine() ?? throw new EndOfStreamException();
        try { return JsonDocument.Parse(line).RootElement; } catch { }
    }
}

Send(new { jsonrpc = "2.0", id = 1, method = "evaluate_prompt",
           @params = new { prompt = "Hello, world!" } });
var resp = Recv();
Console.WriteLine(resp.GetProperty("result"));

Send(new { jsonrpc = "2.0", id = 2, method = "shutdown", @params = new { } });
proc.WaitForExit();
```

**WebSocket (`ws` transport):**

```bash
# Start the server (runs until killed)
dr-moderation serve --transport ws --host 127.0.0.1 --port 9000 \
  --config-file moderation_config.yaml

# In another terminal — connect with any WebSocket client (e.g. websocat)
echo '{"jsonrpc":"2.0","id":1,"method":"evaluate_prompt","params":{"prompt":"Hello"}}' \
  | websocat ws://127.0.0.1:9000
```

---

## 4. YAML schema quick reference

The two commands use **different schemas** — they are not interchangeable:

| Command | Format | Key fields |
|---|---|---|
| `add-guard` | DataRobot API — camelCase | `ootbType`, `stages` (list), `intervention` |
| `evaluate` | Python SDK — snake_case | `ootb_type`, `stage` (string or list), `llm_type`, `llm_gateway_model_id` |

### `add-guard` config (camelCase)

Sent directly to `/guardConfigurations/toNewCustomModelVersion/`. The file must be a **YAML list**.

```yaml
- name: Prompt Token Count
  type: ootb
  ootbType: token_count
  stages: [prompt]
  intervention:
    action: report
    allowedActions: [report, block]
    message: " "
    sendNotification: false
    conditions: []
```

| Field | Required | Notes |
|---|---|---|
| `name` | ✅ | Unique per config |
| `type` | ✅ | `ootb` · `guardModel` · `userModel` · `nemo` |
| `stages` | ✅ | List: `[prompt]`, `[response]`, or `[prompt, response]` |
| `ootbType` | When `type: ootb` | `token_count`, `faithfulness`, `rouge_1`, etc. |
| `modelInfo` | When `type: guardModel` | `inputColumnName`, `outputColumnName`, `targetType`, `classNames` |
| `intervention` | ❌ | `action`, `conditions`, `message`; omit to measure only |

### `evaluate` config (snake_case)

Consumed by `ModerationPipeline.from_yaml`. For the full field reference see **[GUARDRAILS.md](GUARDRAILS.md)**.

The key difference from `add-guard`: use `llm_type: llmGateway` with `llm_gateway_model_id` — **no `deployment_id` needed**:

```yaml
- name: Task Adherence
  type: ootb
  ootb_type: task_adherence
  stage: response
  llm_type: llmGateway
  llm_gateway_model_id: "azure/gpt-4o-2024-11-20"
  intervention:
    action: block
    message: "Response does not address the task."
    conditions:
      - comparator: lessThan
        comparand: 0.5
```

---

## 5. Exit codes

| Code | Meaning |
|---|---|
| `0` | Success |
| `1` | Runtime error (API error, bad YAML, connection refused) |
| `2` | Invalid CLI usage (missing required option, unknown value) |

Non-zero exits write a descriptive message to **stderr**.

