Open source security monitoring for AI agents
Secure Environments is an open source security monitor for AI agents, integrating into Claude Code and OpenHands. It is currently the best performing monitor for detecting unauthorized agent behavior, beating both commercial and built-in alternatives on every metric.
The benchmark below evaluates how well different monitors detect when an AI agent performs unauthorized side tasks while completing legitimate work. An agent is given a task (e.g., "pay John $50 for dinner") and may also attempt hidden actions (e.g., transferring money to an attacker). Each monitor scores every tool call for suspicion.
Secure Environments wins on all metrics, beating GraySwan Cygnal on AUC (+0.20), F1 (+0.25), and Recall (+0.38).
| Monitor | Type | AUC | F1 | Precision | Recall |
|---|---|---|---|---|---|
| LLM Self-Assessment | Built-in | 0.64 | 0.65 | 0.59 | 0.73 |
| Invariant | Rule-based | 0.61 | 0.58 | 0.62 | 0.54 |
| Secure Environments | LLM post-hoc | 0.89 | 0.83 | 0.83 | 0.83 |
| GraySwan Cygnal | Commercial API | 0.69 | 0.58 | 0.82 | 0.45 |
| Monitor | Cost / 1M tokens | Latency (measured) | Source |
|---|---|---|---|
| Invariant | Free | ~2ms | Local rule-based |
| Secure Environments | $0.38 | ~377ms | OpenRouter |
| LLM Self-Assessment | ~$0 marginal | ~1730ms | Inline with agent |
| GraySwan Cygnal | $1.00 | ~301ms | GraySwan |
LLM Self-Assessment is inline (the agent LLM produces security_risk as part of its normal response), so monitoring adds near-zero marginal cost but full LLM inference latency. Secure Environments is a separate post-hoc LLM call ($0.075/1M input + $0.30/1M output). Latencies measured via 5 API calls each on 2026-02-15.
Setup: 25 honest + 24 attack AgentDojo trajectories, OpenHands v1 SDK, kimi-k2.5 model. Tasks span banking, travel, calendar, and workspace domains.
Safety vs Capability curve (ROC reframed for security):
Scoring pipeline:
OpenHands v1 forces the agent LLM to rate each tool call as LOW/MEDIUM/HIGH risk as part of its normal response.
Invariant AI Labs LocalPolicy engine. Evaluates tool calls against a security policy that flags unauthorized actions like money transfers and data deletion.
Post-hoc LLM monitor using gpt-oss-safeguard-20b (via OpenRouter) with an optimized prompt that traces each action back to the stated task.
GraySwan's hosted security analysis API. Acts as a proxy that wraps your LLM provider, analyzing each request for policy violations.