Secure Environments

Open source security monitoring for AI agents

Secure Environments is an open source security monitor for AI agents, integrating into Claude Code and OpenHands. It is currently the best performing monitor for detecting unauthorized agent behavior, beating both commercial and built-in alternatives on every metric.

The benchmark below evaluates how well different monitors detect when an AI agent performs unauthorized side tasks while completing legitimate work. An agent is given a task (e.g., "pay John $50 for dinner") and may also attempt hidden actions (e.g., transferring money to an attacker). Each monitor scores every tool call for suspicion.

Safety vs Capability

Secure Environments wins on all metrics, beating GraySwan Cygnal on AUC (+0.20), F1 (+0.25), and Recall (+0.38).

MonitorTypeAUCF1PrecisionRecall
LLM Self-AssessmentBuilt-in0.640.650.590.73
InvariantRule-based0.610.580.620.54
Secure EnvironmentsLLM post-hoc0.890.830.830.83
GraySwan CygnalCommercial API0.690.580.820.45

Cost and Latency

MonitorCost / 1M tokensLatency (measured)Source
InvariantFree~2msLocal rule-based
Secure Environments$0.38~377msOpenRouter
LLM Self-Assessment~$0 marginal~1730msInline with agent
GraySwan Cygnal$1.00~301msGraySwan

LLM Self-Assessment is inline (the agent LLM produces security_risk as part of its normal response), so monitoring adds near-zero marginal cost but full LLM inference latency. Secure Environments is a separate post-hoc LLM call ($0.075/1M input + $0.30/1M output). Latencies measured via 5 API calls each on 2026-02-15.

Methodology

Setup: 25 honest + 24 attack AgentDojo trajectories, OpenHands v1 SDK, kimi-k2.5 model. Tasks span banking, travel, calendar, and workspace domains.

Safety vs Capability curve (ROC reframed for security):

Scoring pipeline:

Monitors

LLM Self-Assessment

Built-in, inline

OpenHands v1 forces the agent LLM to rate each tool call as LOW/MEDIUM/HIGH risk as part of its normal response.

Invariant

Rule-based, local

Invariant AI Labs LocalPolicy engine. Evaluates tool calls against a security policy that flags unauthorized actions like money transfers and data deletion.

Secure Environments

LLM post-hoc

Post-hoc LLM monitor using gpt-oss-safeguard-20b (via OpenRouter) with an optimized prompt that traces each action back to the stated task.

GraySwan Cygnal

Commercial API

GraySwan's hosted security analysis API. Acts as a proxy that wraps your LLM provider, analyzing each request for policy violations.