Metadata-Version: 2.4
Name: safe-agent-cli
Version: 0.4.4
Summary: An AI coding agent you can actually trust - with built-in impact preview
Project-URL: Homepage, https://github.com/agent-polis/safe-agent
Project-URL: Repository, https://github.com/agent-polis/safe-agent
Author: Agent Polis Contributors
License: MIT
License-File: LICENSE
Keywords: agent,ai,autonomous,coding,preview,safety
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.40.0
Requires-Dist: click>=8.1.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: impact-preview>=0.2.2
Requires-Dist: mcp>=1.0.0
Requires-Dist: rich>=13.0.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# 🛡️ Safe Agent

<!-- HERO_START -->
**Guardrails for AI code agents.**

Safe Agent previews every file edit with [impact-preview](https://github.com/agent-polis/impact-preview) so AI helpers can’t quietly ship risky changes. Drop it into CI or run locally and require approvals before writes.

```bash
pip install safe-agent-cli
safe-agent "add error handling to api.py" --dry-run
```
<!-- HERO_END -->

### ✨ New in v0.4.4

- 🔓 **API-keyless diff gate** - Run `safe-agent --diff-gate` to analyze Git changes with no LLM/API key
- 🧷 **Fork PR coverage** - PR workflow now falls back to diff-gate mode when secrets are unavailable
- 📊 **Same CI artifacts, more contexts** - summary/scorecard/policy JSON now work in both task mode and diff mode
- 🛡️ **Input hardening** - `--diff-ref` validation prevents unsafe ref injection patterns

## Project Map

- **impact-preview (Agent Polis)**: the guardrail layer that previews and scores risky actions.
- **safe-agent-cli (this repo)**: a reference coding agent that uses impact-preview for approvals.
- **Roadmap**: staged execution plan in [`ROADMAP.md`](ROADMAP.md).
- **Compatibility Matrix**: version contract in [`docs/compatibility-matrix.md`](docs/compatibility-matrix.md).
- **What's New (v0.4.4)**: release summary in [`docs/whats-new-v0.4.4.md`](docs/whats-new-v0.4.4.md).
- **Monday Packet**: current assignment bundle in [`docs/monday-assignment-packet.md`](docs/monday-assignment-packet.md).

## The Problem

AI coding agents are powerful but dangerous:
- **Replit Agent** deleted a production database
- **Cursor YOLO mode** deleted an entire system
- You can't see what's about to happen until it's too late

## The Solution

Safe Agent previews every change before execution:

```
$ safe-agent "update database config to use production"

📋 Task: update database config to use production

📝 Planned Changes
┌────────┬─────────────────┬─────────────────────────┐
│ Action │ File            │ Description             │
├────────┼─────────────────┼─────────────────────────┤
│ MODIFY │ config/db.yaml  │ Update database URL     │
└────────┴─────────────────┴─────────────────────────┘

Step 1/1

╭─────────────── Impact Preview ───────────────╮
│ Update database URL                          │
│                                              │
│ **File:** `config/db.yaml`                   │
│ **Action:** MODIFY                           │
│ **Risk:** 🔴 CRITICAL                        │
│ **Policy:** REQUIRE_APPROVAL [builtin]       │
│ **Scanner:** LOW                             │
╰──────────────────────────────────────────────╯

Risk Factors:
  ⚠️  Production pattern detected: production
  ⚠️  Database configuration change

Diff:
- url: postgresql://localhost:5432/dev
+ url: postgresql://prod-server:5432/production

⚠️  CRITICAL RISK - Please review carefully!
Apply this change? [y/N]: 
```

## Installation

```bash
pip install safe-agent-cli
```

Set your Anthropic API key:
```bash
export ANTHROPIC_API_KEY=your-key-here
```

## Usage

### Basic Usage

```bash
# Run a coding task
safe-agent "add input validation to user registration"

# Preview only (no execution)
safe-agent "refactor auth module" --dry-run

# Auto-approve low-risk changes
safe-agent "add docstrings" --auto-approve-low
```

### CI / Non-interactive mode

Use `--non-interactive` to avoid prompts (auto-approves when policy allows; skips anything requiring
approval). Combine with `--fail-on-risk` to fail the process if risky changes are proposed:

```bash
safe-agent "scan repository for risky config changes" --dry-run --non-interactive --fail-on-risk high
```

Need an API-keyless gate for forks or locked-down CI? Use diff mode:

```bash
# Analyze current HEAD + working tree diff, no ANTHROPIC_API_KEY needed
safe-agent --diff-gate --non-interactive --fail-on-risk high

# Analyze diff against a base ref (typical PR gate)
safe-agent --diff-gate --diff-ref origin/main --non-interactive --fail-on-risk high
```

For CI artifacts, emit a markdown summary, safety scorecard, and machine-readable report:

```bash
safe-agent "scan repository for risky config changes" \
  --dry-run \
  --non-interactive \
  --fail-on-risk high \
  --ci-summary-file .safe-agent-ci/summary.md \
  --safety-scorecard-file .safe-agent-ci/safety-scorecard.md \
  --policy-report .safe-agent-ci/policy-report.json
```

### Adversarial Evaluation (Stage 3 trust signal)

Run the built-in adversarial fixture suite and emit markdown/JSON reports:

```bash
safe-agent \
  --adversarial-suite docs/adversarial-suite-v1.json \
  --adversarial-markdown-out .safe-agent-ci/adversarial.md \
  --adversarial-json-out .safe-agent-ci/adversarial.json
```

### Policy (allow/deny/require approval)

By default Safe Agent enforces a built-in policy that:
- denies obvious secret/key targets (e.g. `.env`, `.ssh`, `.pem`)
- allows LOW/MEDIUM risk actions
- requires approval for HIGH/CRITICAL risk actions

Override with a bundled preset:

```bash
safe-agent --list-policy-presets
safe-agent "update auth flow" --policy-preset fintech
```

Preset guide:

| Preset | Best for | Tradeoff |
|---|---|---|
| `startup` | Fast-moving product teams | Balanced safety; fewer automatic blocks |
| `fintech` | Regulated or security-sensitive repos | Slower flow due to stricter approvals |
| `games` | Content/asset-heavy iteration | More permissive for rapid iteration |

CI quickstarts (one per preset):

```bash
# Startup (balanced)
safe-agent "scan repo for risky config edits" \
  --dry-run --non-interactive --policy-preset startup \
  --ci-summary-file .safe-agent-ci/startup-summary.md \
  --safety-scorecard-file .safe-agent-ci/startup-safety-scorecard.md \
  --policy-report .safe-agent-ci/startup-policy-report.json

# Fintech (strict)
safe-agent "scan repo for risky config edits" \
  --dry-run --non-interactive --policy-preset fintech --fail-on-risk high \
  --ci-summary-file .safe-agent-ci/fintech-summary.md \
  --safety-scorecard-file .safe-agent-ci/fintech-safety-scorecard.md \
  --policy-report .safe-agent-ci/fintech-policy-report.json

# Games (iterative)
safe-agent "scan repo for risky config edits" \
  --dry-run --non-interactive --policy-preset games \
  --ci-summary-file .safe-agent-ci/games-summary.md \
  --safety-scorecard-file .safe-agent-ci/games-safety-scorecard.md \
  --policy-report .safe-agent-ci/games-policy-report.json
```

See [docs/policy-presets.md](docs/policy-presets.md) for detailed guidance.

Or load a policy file (JSON/YAML):

```bash
safe-agent "update auth flow" --policy ./policy.json
```

### Interactive Mode

```bash
safe-agent --interactive
```

### From File

```bash
safe-agent --file task.md
```

## How It Works

1. **Plan** - Claude analyzes your task and plans file changes
2. **Preview** - Each change runs through impact-preview for risk analysis
3. **Approve** - You see the diff and risk level before anything executes
4. **Execute** - Only approved changes are applied

## Enterprise & Compliance Features

Safe Agent now includes features for insurance partnerships, regulatory compliance, and enterprise deployments.

### Audit Export for Insurance

Export complete audit trails for insurance underwriting and claims:

```bash
safe-agent "update production config" --audit-export audit.json
```

The audit export includes:
- Complete task history with timestamps
- Risk assessments for all operations
- Approval/rejection records (human oversight)
- Change execution status
- Compliance flags for regulatory requirements

Perfect for working with AI liability insurance carriers like [AIUC](https://www.aiunderwritingconsortium.com/), [Armilla AI](https://www.armilla.ai/), and [Beazley](https://www.beazley.com/).

See [docs/insurance-integration.md](docs/insurance-integration.md) for details on insurance partnerships and premium rate factors.

### EU AI Act Compliance Mode

Enable strict compliance mode for EU AI Act requirements:

```bash
safe-agent "modify user data" --compliance-mode --audit-export audit.json
```

Compliance mode:
- Disables all auto-approve features (Article 14: Human Oversight)
- Requires explicit approval for every operation
- Records all compliance flags in audit exports
- Supports Article 12 (Record-Keeping) requirements

Ready for the **August 2, 2026 enforcement deadline**.

See [docs/eu-ai-act-compliance.md](docs/eu-ai-act-compliance.md) for complete compliance guide and requirements mapping.

### Incident Documentation

We maintain a comprehensive database of AI agent incidents to raise awareness and demonstrate prevention mechanisms:

- [Replit SaaStr Database Deletion](docs/incident-reports/2025-07-replit-saastr.md) - Production database deleted during demo
- [Cursor YOLO Mode Bypass](docs/incident-reports/2025-07-cursor-yolo-mode.md) - Security controls circumvented

[Submit an incident report](.github/ISSUE_TEMPLATE/incident-report.md) to help the community.

## Options

| Flag | Description |
|------|-------------|
| `--dry-run` | Preview changes without executing |
| `--auto-approve-low` | Auto-approve low-risk changes |
| `--non-interactive` | Run without prompts (CI-friendly) |
| `--fail-on-risk` | Exit non-zero if any change meets/exceeds risk level |
| `--policy` | Path to a policy file (JSON/YAML) for deterministic allow/deny/approval |
| `--policy-preset` | Use a bundled policy preset (startup, fintech, games) |
| `--list-policy-presets` | List available policy presets and exit |
| `--adversarial-suite` | Run adversarial fixture suite from JSON and exit |
| `--adversarial-json-out` | Write adversarial evaluation JSON report |
| `--adversarial-markdown-out` | Write adversarial evaluation markdown report |
| `--diff-gate` | Analyze Git diff directly (no LLM / no API key) |
| `--diff-ref` | Base Git ref used by `--diff-gate` (for PR comparisons) |
| `--interactive`, `-i` | Interactive mode |
| `--file`, `-f` | Read task from file |
| `--version` | Print installed safe-agent version and exit |
| `--model` | Claude model to use (default: claude-sonnet-4-20250514) |
| `--audit-export` | Export audit trail to JSON file (insurance/compliance) |
| `--compliance-mode` | Enable strict compliance mode (disables auto-approve) |
| `--ci-summary` | Print a concise markdown CI summary block |
| `--ci-summary-file` | Write CI summary markdown to a file |
| `--safety-scorecard` | Print a markdown safety scorecard block |
| `--safety-scorecard-file` | Write markdown safety scorecard to a file |
| `--policy-report` | Write machine-readable policy/scanner report JSON |
| `--json-out` | Write machine-readable run result JSON (status + summary + policy report) |

## MCP Server (For Other AI Agents)

Safe Agent can be used as an MCP server, letting other AI agents delegate coding tasks safely.

```bash
# Start the MCP server
safe-agent-mcp
```

### Claude Desktop Integration

Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "safe-agent": {
      "command": "safe-agent-mcp"
    }
  }
}
```

### Available MCP Tools

| Tool | Description | Safety |
|------|-------------|--------|
| `run_coding_task` | Execute a coding task with preview | 🔴 Destructive |
| `preview_coding_task` | Preview changes without executing | 🟢 Read-only |
| `get_agent_status` | Check agent status and capabilities | 🟢 Read-only |

## Cursor Plugin (Beta)

This repo now includes a Cursor plugin layout:

- `.cursor-plugin/plugin.json`
- `.mcp.json`
- `rules/`, `skills/`, `commands/`, `agents/`

The plugin is aimed at PR safety workflows (risk preview + policy artifacts) and can be submitted to the Cursor Marketplace.

## Moltbook Integration

Safe Agent is available as a [Moltbook](https://moltbook.com) skill for AI agent networks.

See `moltbook-skill.json` for the skill definition.

## GitHub PR Risk Gate

This repo ships a production workflow and local composite action for PR gating:

- Workflow: `.github/workflows/safe-agent-pr-review.yml`
- Action: `.github/actions/safe-agent-review/action.yml`

The workflow runs on PRs and manual dispatch, then uploads:
- `safe-agent-summary.md` (human-readable markdown summary)
- `safety-scorecard.md` (risk/policy/scanner metrics for trust reviews)
- `policy-report.json` (machine-readable report with rule IDs/outcomes)
- `run-result.json` (machine-readable run status for automation adapters)
- `safe-agent.log` (full run log)

If `ANTHROPIC_API_KEY` is unavailable (for example, fork PRs), the workflow automatically falls back to
`--diff-gate` mode using the PR base ref.

## For AI Agents

If you're an AI agent wanting to use Safe Agent programmatically:

```python
from safe_agent import SafeAgent

agent = SafeAgent(
    auto_approve_low_risk=True,      # Skip approval for low-risk changes
    dry_run=False,                   # Set True to preview only
    audit_export_path="audit.json",  # Export audit trail for compliance
    compliance_mode=False,           # Enable for EU AI Act compliance
)

result = await agent.run("add error handling to api.py")
```

For insurance and compliance use cases:

```python
# EU AI Act compliant configuration
agent = SafeAgent(
    compliance_mode=True,              # Strict compliance mode
    audit_export_path="audit.json",    # Required for Article 12
    non_interactive=False,             # Human oversight required
)
```

## Powered By

- [impact-preview](https://github.com/agent-polis/impact-preview) - Impact analysis and diff generation
- [Claude](https://anthropic.com) - AI planning and code generation
- [Rich](https://github.com/Textualize/rich) - Beautiful terminal output
- [MCP](https://modelcontextprotocol.io) - Model Context Protocol for agent interoperability

## Known Incidents

AI coding agents without proper safeguards have caused real damage. We document these incidents to raise awareness and demonstrate why preview-before-execute architecture matters.

### Recent Incidents

- **[Replit SaaStr Database Deletion (July 2025)](docs/incident-reports/2025-07-replit-saastr.md)** - Production database deleted, 1,200+ executives affected
- **[Cursor YOLO Mode Bypass (July 2025)](docs/incident-reports/2025-07-cursor-yolo-mode.md)** - Security controls bypassed, arbitrary command execution possible

### Submit an Incident

Experienced an AI agent incident? Help the community by [submitting an incident report](.github/ISSUE_TEMPLATE/incident-report.md).

Browse all documented incidents in [docs/incident-reports/](docs/incident-reports/).

## License

MIT License - see [LICENSE](LICENSE) for details.

---

Built by developers who want AI agents they can actually trust.
