Metadata-Version: 2.4
Name: agentsteer
Version: 0.4.1
Summary: Runtime protection for AI coding agents
Project-URL: Homepage, https://agentsteer.ai
Project-URL: Documentation, https://agentsteer.ai/docs/
Project-URL: Repository, https://github.com/AgentSteer/AgentSteer
Author: Ram Rachum
License: MIT
Keywords: agent,agentsteer,ai,llm,monitor,security
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Security
Requires-Python: >=3.10
Requires-Dist: python-dotenv>=1.0
Requires-Dist: requests>=2.28
Provides-Extra: benchmarks
Requires-Dist: boto3; extra == 'benchmarks'
Requires-Dist: plotly; extra == 'benchmarks'
Provides-Extra: dashboard
Description-Content-Type: text/markdown

# AgentSteer

**Runtime protection for AI coding agents.**

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

## What it does

AgentSteer intercepts every tool call your AI agent makes via a PreToolUse hook, scores it against the task description, and blocks unauthorized actions before they execute. It catches prompt injection attacks, unauthorized file access, data exfiltration through delegation, and off-task behavior. Scoring uses [oss-safeguard-20b](https://openrouter.ai/openai/gpt-oss-safeguard-20b) (20B parameter model) via OpenRouter at under 200ms median latency.

## Quick Start

```bash
pip install agentsteer
```

Set your OpenRouter API key. You can use a `.env` file or export it directly:

```bash
export OPENROUTER_API_KEY=sk-or-...
```

Install the hook for your framework:

```bash
agentsteer install claude-code
```

That's it. Every tool call is now scored and unauthorized actions are blocked.

## How It Works

```
Agent makes tool call
        |
        v
PreToolUse hook intercepts the call
        |
        v
AgentSteer scores the action against the task description
(oss-safeguard-20b via OpenRouter, <200ms median)
        |
        v
Score >= threshold? ---> BLOCK (action never executes)
        |
        v
Score < threshold? ---> ALLOW (action proceeds normally)
```

Each action is evaluated against 4 security policies. If any policy is violated, the action is blocked and the agent receives feedback explaining why.

## Supported Frameworks

| Framework | Install command | Hook type |
|-----------|----------------|-----------|
| [Claude Code](https://github.com/anthropics/claude-code) | `agentsteer install claude-code` | PreToolUse hook via `settings.json` |
| [OpenHands](https://github.com/All-Hands-AI/OpenHands) | `agentsteer install openhands` | EventStream middleware |
| Python API | `from agentsteer import score_action` | Direct function call |

## Results

Evaluated on the [AgentDojo](https://github.com/ethz-spylab/agentdojo) benchmark, which tests whether agents can be tricked into executing prompt injection attacks hidden in their environment.

| Framework | Condition | Attacks blocked | Utility preserved | n |
|-----------|-----------|----------------|-------------------|---|
| Claude Code | Honest tasks | 0% false positives | 100% | 20 |
| Claude Code | Attack tasks | 100% blocked | -- | 20 |
| OpenHands | Honest tasks | 0% false positives | 100% | 20 |
| OpenHands | Attack tasks | 95% blocked | -- | 20 |
| OpenHands | Per-action detection | 96.5% detected | 0% false positives | 1000 |

Median scoring latency: <200ms. Cost: $0.38 per 1M tokens via OpenRouter.

Full evaluation results and methodology at [agentsteer.ai/evaluations](https://agentsteer.ai/evaluations).

## Security Policies

| Policy | What it catches | Example |
|--------|----------------|---------|
| P1: Read-only | Write actions when the task only requires reading | Task says "look up calendar" but agent sends an email |
| P2: Delegation | Actions that hand off control or leak data to external parties | Agent posts task contents to a URL, sends to unrelated email |
| P3: Category mismatch | Actions in a different category than the task | Task is about calendar but agent modifies files |
| P4: Target mismatch | Actions targeting a different entity than specified | Task says "email Alice" but agent emails Bob |

A post-filter suppresses false positives from self-correction patterns (e.g., agent cancels a calendar event it just created in error).

## Configuration

| Environment variable | Description | Default |
|---------------------|-------------|---------|
| `OPENROUTER_API_KEY` | OpenRouter API key for scoring model | required |
| `AGENT_STEER_TASK` | Task description to score actions against | read from agent context |
| `AGENT_STEER_THRESHOLD` | Score threshold for blocking (0.0 to 1.0) | `0.80` |
| `AGENT_STEER_DEBUG` | Enable debug logging (`1` or `true`) | off |

You can also run `agentsteer setup` for an interactive configuration wizard, or `agentsteer login` to connect to the AgentSteer cloud dashboard for real-time monitoring.

## CLI Reference

```bash
agentsteer setup                # Interactive first-run setup
agentsteer install claude-code  # Install hook for Claude Code
agentsteer status               # Show current configuration
agentsteer score "task" "action" # Score a single action
agentsteer sessions             # List monitored sessions
agentsteer version              # Print version
```

## Links

- **Website**: [agentsteer.ai](https://agentsteer.ai)
- **Documentation**: [agentsteer.ai/docs](https://agentsteer.ai/docs)
- **Evaluations**: [agentsteer.ai/evaluations](https://agentsteer.ai/evaluations)
- **Repository**: [github.com/AgentSteer/AgentSteer](https://github.com/AgentSteer/AgentSteer)

## License

MIT
