Metadata-Version: 2.4
Name: stateset-cua
Version: 2.0.3
Summary: Production-grade AI automation platform powered by Claude Opus 4.6
Author-email: StateSet <support@stateset.io>
License-Expression: LicenseRef-Proprietary
Project-URL: Homepage, https://github.com/stateset/stateset-computer-use-agent
Project-URL: Documentation, https://github.com/stateset/stateset-computer-use-agent#readme
Project-URL: Repository, https://github.com/stateset/stateset-computer-use-agent
Project-URL: Issues, https://github.com/stateset/stateset-computer-use-agent/issues
Keywords: ai,automation,claude,computer-use,agent
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: anthropic[bedrock,vertex]<1.0.0,>=0.39.0
Requires-Dist: jsonschema<5.0.0,>=4.22.0
Requires-Dist: boto3<2.0.0,>=1.28.57
Requires-Dist: google-auth<3,>=2
Requires-Dist: tenacity<10.0.0,>=8.2.2
Requires-Dist: httpx<1.0.0,>=0.25.3
Requires-Dist: PyYAML<7.0.0,>=6.0
Requires-Dist: pillow<12.0.0,>=10.0.0
Requires-Dist: PyAutoGUI<1.0.0,>=0.9.54
Requires-Dist: imagehash<5.0.0,>=4.3.1
Requires-Dist: requests<3.0.0,>=2.31.0
Requires-Dist: urllib3<3.0.0,>=2.0.0
Requires-Dist: psutil<7.0.0,>=5.9.0
Requires-Dist: opentelemetry-api<2.0.0,>=1.20.0
Requires-Dist: opentelemetry-sdk<2.0.0,>=1.20.0
Requires-Dist: opentelemetry-exporter-otlp<2.0.0,>=1.20.0
Requires-Dist: prometheus-client<1.0.0,>=0.19.0
Requires-Dist: typer[all]<1.0.0,>=0.12.0
Requires-Dist: rich<14.0.0,>=13.7.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.1; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-timeout>=2.1.0; extra == "dev"
Requires-Dist: pytest-mock>=3.11.1; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.3.1; extra == "dev"
Requires-Dist: hypothesis>=6.92.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: mypy>=1.7.0; extra == "dev"
Requires-Dist: bandit>=1.7.5; extra == "dev"
Requires-Dist: safety>=2.3.5; extra == "dev"
Requires-Dist: detect-secrets>=1.4.0; extra == "dev"
Provides-Extra: dashboard
Requires-Dist: streamlit>=1.38.0; extra == "dashboard"
Provides-Extra: security
Requires-Dist: cryptography>=41.0.0; extra == "security"
Provides-Extra: all
Requires-Dist: stateset-cua[dashboard,dev,security]; extra == "all"
Dynamic: license-file

# StateSet Computer Use Agent

<div align="center">

**Advanced AI Agent System for Autonomous Computer Operation**

[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Claude Opus 4.6](https://img.shields.io/badge/claude-opus--4.6-purple.svg)](https://www.anthropic.com/claude)
[![License: Proprietary](https://img.shields.io/badge/license-Proprietary-red.svg)](LICENSE)

[Features](#-features) • [Quick Start](#-quick-start) • [Documentation](#-documentation) • [Agents](#-available-agents) • [Architecture](#-architecture)

</div>

---

## 🎯 Overview

StateSet Computer Use Agent is a **production-grade AI automation platform** powered by Claude Opus 4.6, designed for autonomous computer operation with human-level reliability and intelligence. The system uses multiple specialized AI agents that can see, understand, and interact with desktop environments to complete complex, long-running tasks.

### Why StateSet Computer Use Agent?

- **🧠 State-of-the-Art AI**: Powered by Claude Opus 4.6 with extended thinking capabilities
- **⚡ 30-50% Faster**: Automatic parallel tool execution for independent operations
- **💰 95% Cost Savings**: Research-based context engineering reduces token usage dramatically
- **🔄 Indefinite Conversations**: Maintains EXCELLENT attention quality across unlimited context
- **🏢 Production-Ready**: Includes monitoring, billing, security, and graceful failure handling
- **🎨 Multi-Agent**: 7 specialized agents for different business workflows

## 🆕 What's New in Claude 4.5

### Updated model lineup
- **Claude Opus 4.6** – most intelligent Anthropic model ever shipped, now priced for day-to-day production agents.
- **Claude Sonnet 4.5** – best balance of capability and cost for complex coding or orchestration.
- **Claude Haiku 4.5** – fastest Haiku yet with near-frontier reasoning and the first Haiku model that supports extended thinking.

### Opus 4.6 enhancements
- **Maximum intelligence** across reasoning, debugging, and strategic planning tasks.
- **Thinking block preservation** keeps the model’s reasoning context intact across turns for better long-running workflows (no extra flags required).
- **Computer-use excellence** with the new `zoom` action in `computer_20251124`, enabling pixel-level inspection of dense UI or fine print before taking action.
- **Practical performance** thanks to lower pricing plus automatic prompt caching, so advanced agents stay affordable.

### Effort parameter (beta)
- Claude Opus 4.6 is the only model that accepts an `effort` setting (`low`, `medium`, `high`).
- We automatically attach the required `effort-2025-11-24` beta header and forward the setting via `output_config`.
- Configure it per run with:
  ```bash
  python main.py --effort medium "triage support tickets and summarize blockers"
  ```
  Use `low` for high-volume automations, `medium` for balanced cost/performance, and `high` (default) for maximal quality.
- Works alongside the thinking token budget—effort controls overall token appetite while `thinking_budget` still caps meta-reasoning tokens.

## 🌟 Core Innovation: Context Engineering

**Based on Anthropic's Latest Research** - Full implementation of all 5 patterns from ["Effective Context Engineering for AI Agents"](https://www.anthropic.com/research):

| Pattern | Implementation | Result |
|---------|---------------|--------|
| **Just-in-Time Retrieval** | grep/head/tail instead of full file reads | 60-99% token savings |
| **Dynamic Compaction** | Adaptive clearing based on attention budget | Maintains quality as context grows |
| **Structured Note-Taking** | Persistent memory outside context window | Unlimited task complexity |
| **Sub-Agent Compression** | Exploration agents return concise summaries | 50k → 2k token summaries |
| **Attention Budget Monitoring** | Real-time quality tracking (EXCELLENT→CRITICAL) | Prevents quality degradation |

**Impact**: Enables indefinite conversations with **EXCELLENT** attention quality while achieving **95% cost reduction** compared to naive implementations.

[Read More: Context Engineering Details →](CONTEXT_ENGINEERING.md)

## 🚀 Quick Start

### Prerequisites

- Ubuntu Linux 20.04+ (kernel 5.15.0+)
- Python 3.10 or higher
- Anthropic API key ([get one here](https://console.anthropic.com/))
- X11 virtual display (we'll set this up)
- *(Optional)* Node.js + npm if you want to use the StateSet CLI (`@stateset-cli`) from within agent tasks.

### Installation

**Option 1: pip install (recommended)**

```bash
pip install stateset-cua

# Configure API key
export ANTHROPIC_API_KEY='your-key-here'

# Start virtual display (required for GUI automation)
Xvfb :1 -screen 0 1920x1080x24 &
export DISPLAY=:1

# Run your first agent
stateset-cua run "auto-close resolved tickets"
```

**Option 2: From source**

```bash
git clone https://github.com/stateset/stateset-computer-use-agent.git
cd stateset-computer-use-agent
pip install -e ".[dev]"

export ANTHROPIC_API_KEY='your-key-here'
Xvfb :1 -screen 0 1920x1080x24 &
export DISPLAY=:1

stateset-cua run "auto-close resolved tickets"
```

**Need more help?** See the comprehensive [Getting Started Guide →](GETTING_STARTED.md)

## 🤖 Available Agents

StateSet includes 7 specialized agents optimized for different business workflows:

| Agent | Purpose | Example Use Case |
|-------|---------|------------------|
| **AUTO_CLOSE** | Support ticket automation | "auto-close all resolved tickets from last 24 hours" |
| **SOCIAL_MEDIA** | Content moderation & engagement | "social media hide inappropriate comments on Facebook" |
| **LINKEDIN_MESSENGER** | Professional outreach | "linkedin send connection requests to AI engineers in SF" |
| **SLACK_SUPPORT** | Customer support automation | "slack respond to all unanswered questions in #support" |
| **SHOPIFY** | E-commerce management | "shopify update inventory for out-of-stock products" |
| **ONBOARDING** | User onboarding workflows | "onboard new enterprise customer with custom rules" |
| **STATESET_AGENTIC** | General-purpose automation | "organize desktop files and create summary report" |

### Multi-Agent Orchestration

Run multiple agents in parallel for complex workflows:

```bash
python main.py "auto-close tickets and social media monitoring and slack support"
```

**How it works**:
- Automatic keyword detection selects appropriate agents
- Parallel execution (not sequential) for independent tasks
- Unified logging with `[AGENT_TYPE]` prefixes
- Aggregated metrics and billing

## 🛠️ Key Features

### 1. Computer Vision & Control

Agents can see and interact with any desktop application:

- **Screenshot Analysis**: High-resolution screen capture with caching
- **Mouse Control**: Click, drag, scroll with pixel-perfect precision
- **Keyboard Input**: Type text, keyboard shortcuts, special keys
- **Adaptive Delays**: Smart waiting based on action type (0.0s-0.6s)

### 2. Intelligent Parallel Execution

**Automatic dependency analysis** for safe parallelization:

```python
# Before optimization (3 sequential API calls)
tool_use_1 = web_search("Claude AI")        # 2.5s
tool_use_2 = web_search("Anthropic")        # 2.5s
tool_use_3 = web_search("computer use")     # 2.5s
# Total: 7.5 seconds

# After optimization (1 parallel API call)
parallel_execution([
    web_search("Claude AI"),
    web_search("Anthropic"),
    web_search("computer use")
])
# Total: 2.5 seconds (3x faster!)
```

**Performance**: 30-50% speed improvement on real-world tasks

[Read More: Parallel Execution →](PARALLEL_EXECUTION.md)

### 3. Advanced Tool Suite

Agents have access to powerful tools across multiple categories:

| Category | Tools | Description |
|----------|-------|-------------|
| **Computer** | click, type, scroll, zoom, screenshot | Desktop interaction (computer_20251124) |
| **Web** | web_search, web_fetch | Internet access with citations |
| **Code** | code_execution | Sandboxed Python/Bash execution |
| **Files** | create, read, edit, search | File management with path protection |
| **Memory** | view, create, edit, delete, rename | Persistent agent memory with injection protection |
| **Text Editor** | str_replace, insert | Advanced file editing |
| **Subagents** | spawn_subagent | Spawn isolated sub-agents for task decomposition |
| **MCP** | mcp__*__* | External tools via Model Context Protocol |
| **CLI** | stateset_cli | StateSet Node CLI integration |

[Read More: Tool Reference →](TOOL_REFERENCE.md)

### 4. Subagent Spawning & MCP Integration

**Subagents** implement Anthropic's sub-agent compression pattern -- the main agent can spawn specialized sub-agents that operate in isolated contexts and return compressed summaries (50k exploration down to 2k), achieving 95% context savings.

**MCP** connects external services (Slack, GitHub, Postgres, and more) as tools via the Model Context Protocol. Supports stdio, SSE, and HTTP transports with 8 pre-configured presets.

### 5. Structured JSON Output

Force Claude to return valid JSON matching a specified schema for reliable automation pipelines. Includes pre-defined schemas for ticket analysis, task results, code review, and entity extraction.

### 6. Production-Grade Observability

Unified observability system combining all monitoring concerns:

- **Structured Logging**: JSON-formatted with automatic request ID correlation
- **Prometheus Metrics**: Agent duration, tool execution counts, API latency, cost tracking
- **OpenTelemetry Tracing**: Distributed tracing with automatic span creation
- **Real-time Streaming**: SSE and WebSocket endpoints for dashboard integration
- **Budget Warnings**: Automatic alerts when token/cost budgets approach limits
- **Health Monitoring**: API connectivity, display, memory, disk checks with circuit breakers

### 7. Security-First Design

Multiple layers of security hardened across the stack:

- **Prompt Injection Protection**: 11-pattern content sanitizer in memory tool
- **Directory Traversal Prevention**: `resolve()` + `relative_to()` + symlink detection
- **Agent Isolation**: Separate memory directories per agent ID
- **Safe Tool Execution**: Pre-execution validation via `ToolExecutionGuard`
- **Dashboard Auth**: JWT-based with tenant isolation, rate limiting, security headers
- **Circuit Breakers**: Fault tolerance for external API calls (5 failures → open → 60s recovery)

[Read More: Security Considerations →](USERGUIDE.md#security-considerations)

## 📋 Example Usage

### Basic Agent Execution

```bash
# Using convenience scripts
./start-autoclose-agent.sh
./start-socialmedia-agent.sh
./start-linkedin-agent.sh

# Custom instructions
python main.py "auto-close all tickets marked as resolved"
python main.py "social media hide comments containing profanity"
python main.py "linkedin message CTOs at Series A startups"
```

### Advanced Workflows

```bash
# Multi-step workflow
python main.py "auto-close resolved tickets, then generate summary report"

# Conditional logic
python main.py "social media hide inappropriate comments only if flagged by 2+ users"

# Complex automation
python main.py "shopify find products with inventory < 10 and create reorder report"
```

### Tool search & effort controls

- Defer heavyweight tool schemas until Claude actually needs them:
  ```bash
  python main.py --tool-search bm25 --defer-tool agi_agent --defer-tool memory "run a quarterly revenue analysis"
  ```
- Dial Claude Opus 4.6’s token appetite up or down with the `--effort` flag (we add the `effort-2025-11-24` beta header for you):
  ```bash
  python main.py --effort low "gather 10 competitor pricing snapshots"
  ```

### Monitoring & Debugging

```bash
# Real-time log filtering
python main.py "your task" 2>&1 | grep "\[AUTO_CLOSE\]"

# Save complete logs
python main.py "your task" 2>&1 | tee logs/run_$(date +%Y%m%d_%H%M%S).log

# View screenshots
ls -lh screenshots/AUTO_CLOSE/
eog screenshots/AUTO_CLOSE/screenshot_*.png
```

## 🏗️ Architecture

### System Overview

```
┌─────────────────────────────────────────────────────────────────────┐
│                            CLI / Entry Points                        │
│  main.py (orchestrator)    start-*-agent.sh    stateset-cua CLI      │
│  --tool-version  --effort  --tool-search  --defer-tool  --agent-type │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                    ┌────────────┴────────────┐
                    ▼                         ▼
          ┌──────────────────┐      ┌──────────────────┐
          │  Agent Selection  │      │  Health Check     │
          │  (keyword match)  │      │  (API, display,   │
          │  get_active_agents│      │   memory, disk)   │
          └────────┬─────────┘      └──────────────────┘
                   │
        ┌──────────┼──────────┐         Parallel asyncio.Tasks
        ▼          ▼          ▼
  ┌───────────┐┌───────────┐┌───────────┐
  │ Agent 1   ││ Agent 2   ││ Agent N   │   run_agent() per agent type
  │ Loop      ││ Loop      ││ Loop      │   with adaptive config +
  │           ││           ││           │   model routing + skills
  └─────┬─────┘└─────┬─────┘└─────┬─────┘
        │             │             │
        └──────────┬──┘─────────────┘
                   ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                    sampling_loop()                           │
  │                    agent/loop.py                             │
  │                                                             │
  │  ┌─────────────┐  ┌──────────────┐  ┌────────────────────┐ │
  │  │ System      │  │ Circuit      │  │ Context            │ │
  │  │ Prompt Init │  │ Breaker      │  │ Optimizer          │ │
  │  │ (StateSet   │  │ (API fault   │  │ (5 Anthropic       │ │
  │  │  APIs)      │  │  tolerance)  │  │  patterns)         │ │
  │  └─────────────┘  └──────────────┘  └────────────────────┘ │
  │                                                             │
  │  ┌──────────────────────────────────────────────────────┐   │
  │  │              Claude Opus 4.6 API Call                 │   │
  │  │  Providers: Anthropic | AWS Bedrock | Google Vertex   │   │
  │  │  Betas: prompt caching, context management,           │   │
  │  │         tool search, effort, web/code/files           │   │
  │  └──────────────────────┬───────────────────────────────┘   │
  │                         │                                   │
  │  ┌──────────────────────▼───────────────────────────────┐   │
  │  │           Parallel Tool Executor                      │   │
  │  │  DependencyAnalyzer → group independent calls         │   │
  │  │  asyncio.gather() for parallel, sequential otherwise  │   │
  │  ├───────────────────────────────────────────────────────┤   │
  │  │                                                       │   │
  │  │  Client-Side Tools        Server-Side Tools           │   │
  │  │  ┌──────────────────┐     ┌────────────────────┐      │   │
  │  │  │ ComputerTool     │     │ web_search         │      │   │
  │  │  │ BashTool         │     │ web_fetch          │      │   │
  │  │  │ EditTool         │     │ code_execution     │      │   │
  │  │  │ MemoryTool       │     │ files_api          │      │   │
  │  │  │ AGITool          │     │ tool_search        │      │   │
  │  │  │ SubagentTool     │     └────────────────────┘      │   │
  │  │  │ StateSetCLITool  │                                 │   │
  │  │  │ AskUserTool      │     MCP Tools (External)        │   │
  │  │  └──────────────────┘     ┌────────────────────┐      │   │
  │  │                           │ mcp__slack__*      │      │   │
  │  │                           │ mcp__github__*     │      │   │
  │  │                           │ mcp__postgres__*   │      │   │
  │  │                           └────────────────────┘      │   │
  │  └───────────────────────────────────────────────────────┘   │
  │                                                             │
  │  ┌─────────────┐  ┌──────────────┐  ┌────────────────────┐ │
  │  │ Stuck       │  │ Checkpoint   │  │ Subagent           │ │
  │  │ Detector    │  │ Manager      │  │ Manager            │ │
  │  │ (loop/cycle │  │ (resume      │  │ (isolated context, │ │
  │  │  detection) │  │  long tasks) │  │  Haiku compress)   │ │
  │  └─────────────┘  └──────────────┘  └────────────────────┘ │
  └─────────────────────────────────────────────────────────────┘
                   │
                   ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                   Observability Layer                        │
  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐ │
  │  │Structured│  │Prometheus│  │OpenTel   │  │Event Bus   │ │
  │  │Logging   │  │Metrics   │  │Tracing   │  │(SSE + WS)  │ │
  │  └──────────┘  └──────────┘  └──────────┘  └────────────┘ │
  └─────────────────────────────────────────────────────────────┘
                   │
                   ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                    Dashboard (Web UI)                        │
  │  ┌─────────────────┐  ┌─────────┐  ┌────────────────────┐ │
  │  │ Next.js Frontend │  │ FastAPI │  │ Celery Worker      │ │
  │  │ React Query + SSE│  │ Backend │  │ (invokes           │ │
  │  │ :3000            │  │ :8000   │  │  sampling_loop)    │ │
  │  └─────────────────┘  └─────────┘  └────────────────────┘ │
  │  ┌──────────┐  ┌──────────┐  ┌──────────────────────────┐ │
  │  │PostgreSQL│  │  Redis   │  │ MinIO (S3 artifacts)     │ │
  │  └──────────┘  └──────────┘  └──────────────────────────┘ │
  └─────────────────────────────────────────────────────────────┘
```

### How It Works: Request Flow

1. **CLI Invocation** -- `python main.py "auto-close tickets"` parses runtime options (tool version, effort level, tool search, deferred tools) and enters `continuous_loop()`.
2. **Agent Selection** -- `get_active_agents()` matches keywords in the instruction to agent types. Multiple matches run in parallel as independent `asyncio.Task` instances.
3. **Adaptive Configuration** -- Each agent gets a tailored config: thinking budget, max tokens, complexity score, and model selection via `select_model_for_task()`.
4. **Sampling Loop** -- `sampling_loop()` is the core conversation engine. It initializes tools from `TOOL_GROUPS_BY_VERSION`, fetches the system prompt from StateSet APIs, creates the API client, and enters the message loop.
5. **API Call** -- Each iteration sends the conversation to Claude Opus 4.6 through a `CircuitBreaker`, with prompt caching on the 3 most recent turns and dynamic context management adapting compaction aggressiveness to current token usage.
6. **Tool Execution** -- Claude's tool calls are analyzed by `DependencyAnalyzer`. Read-only, independent calls execute in parallel via `asyncio.gather()`; dependent calls execute sequentially. MCP tools are dispatched to their respective server connections.
7. **Context Optimization** -- After each iteration, the `ContextOptimizer` tracks attention quality (EXCELLENT through CRITICAL) and applies compaction strategies. Messages are compressed when history exceeds 20 turns.
8. **Completion** -- When Claude responds with no tool calls, the loop returns `SamplingLoopResult`. The orchestrator runs `analyze_task_completion()`, records metrics, and sends a Stripe billing event.

### Core Components

| Component | Responsibility | Location |
|-----------|---------------|----------|
| **Orchestrator** | CLI parsing, agent selection, parallel dispatch, billing | `main.py` |
| **Agent Loop** | Conversation loop, API calls, message management | `agent/loop.py` |
| **Tool Collection** | Tool registration, dispatch, deferred loading | `agent/tools/collection.py` |
| **Tool Groups** | Version-specific tool bundles (20241022, 20250124, 20251124, cli) | `agent/tools/groups.py` |
| **Parallel Executor** | Dependency analysis, safe parallel tool execution | `agent/parallel_executor.py` |
| **Context Optimizer** | JIT retrieval, compaction, attention budget, sub-agent compression | `agent/context_optimizer.py` |
| **Subagent Manager** | Spawn isolated sub-agents (explore, analyze, code, research) | `agent/subagent.py` |
| **MCP Client** | Connect external tools via stdio/SSE/HTTP transports | `agent/mcp_client.py` |
| **Structured Output** | JSON schema validation, extraction, pre-defined schemas | `agent/structured_output.py` |
| **Stuck Detector** | Repeating actions, no visual changes, cycling detection | `agent/stuck_detection.py` |
| **Checkpoint Manager** | Save/resume long-running tasks, heartbeat monitoring | `agent/checkpoint.py` |
| **Skill Manager** | Skill profiles, agent-skill mapping, container resolution | `agent/skill_manager.py` |
| **Health Checker** | API connectivity, display, memory, disk checks | `agent/health.py` |
| **Circuit Breaker** | Fault tolerance for API calls (closed/open/half-open) | `agent/health.py` |
| **Config** | Centralized settings with YAML loading and env substitution | `agent/config.py` |
| **Observability** | Unified logging, Prometheus metrics, OpenTelemetry tracing, event streaming | `agent/observability/` |
| **Exception Hierarchy** | Typed errors (retryable, non-retryable, budget, tool, resource) | `agent/exceptions.py` |

### Tool Versions

The system ships with 4 tool bundles, selected via `--tool-version`:

| Version | Tools | Beta Flag | Display Required |
|---------|-------|-----------|-----------------|
| `computer_use_20251124` (default) | Computer (with zoom), Edit, Bash, Memory, AGI, CLI, AskUser | `computer-use-2025-11-24` | Yes |
| `computer_use_20250124` | Computer, Edit, Bash, Memory, AGI, CLI, AskUser | `computer-use-2025-01-24` | Yes |
| `computer_use_20241022` | Computer, Edit, Bash | `computer-use-2024-10-22` | Yes |
| `cli_20250124` | Bash, Edit, Memory, AGI, CLI, AskUser | None | No |

Additionally, `SubagentTool` is loaded dynamically at runtime (requires API key), and MCP tools are added from connected servers.

### Subagent System

Implements Anthropic's sub-agent compression pattern. The main agent spawns isolated sub-agents that return compressed summaries instead of raw output (50k tokens of exploration compressed to 2k summary):

| Type | Model | Use Case | Max Turns | Timeout |
|------|-------|----------|-----------|---------|
| `explore` | Haiku 4.5 | Fast codebase/data exploration | 5 | 60s |
| `analyze` | Sonnet 4.5 | Deep analysis with thinking | 8 | 90s |
| `execute` | Sonnet 4.5 | Task execution with verification | 15 | 180s |
| `research` | Haiku 4.5 | Web search and synthesis | 8 | 120s |
| `code` | Sonnet 4.5 | Code generation and modification | 12 | 180s |

### MCP Integration

Connect external tools via [Model Context Protocol](https://modelcontextprotocol.io/) with 3 transport types (stdio, SSE, HTTP) and 8 pre-configured presets:

```bash
# Available presets: slack, github, postgres, filesystem, memory, brave-search, puppeteer, sqlite
```

Tools appear in the conversation as `mcp__<server>__<tool>` (e.g., `mcp__slack__send_message`).

### Dashboard

The web dashboard provides job management, real-time monitoring, and artifact storage:

```
Frontend (Next.js 14)          Backend (FastAPI)           Worker (Celery)
─────────────────────          ─────────────────           ───────────────
Dashboard home                 POST /api/jobs              Receives job from
Launch Task form         ──►   GET  /api/jobs              Redis queue
Live Runs (SSE)          ◄──   GET  /api/events/jobs       Calls sampling_loop()
Outputs browser                GET  /api/artifacts         Stores artifacts in S3
Template management            CRUD /api/templates         Records billing via Stripe
Usage metrics                  GET  /api/metrics/overview
                               CRUD /api/agi              PostgreSQL (persistence)
                               CRUD /api/skills           Redis (broker/backend)
                               GET  /api/observability    MinIO (S3 artifacts)
```

Deploy with Docker Compose (`docker compose up -d`): frontend on `:3000`, backend on `:8000`, with Postgres, Redis, and MinIO. Optional monitoring profile adds Prometheus, Grafana, and OpenTelemetry Collector.

[Read More: Architecture Documentation →](ARCHITECTURE.md)

## 📚 Documentation

### Getting Started

- **[Getting Started Guide](GETTING_STARTED.md)** - Step-by-step setup for beginners (10 min)
- **[Quick Start](QUICKSTART.md)** - Common commands and usage patterns (5 min)
- **[User Guide](USERGUIDE.md)** - Comprehensive reference (30 min)

### Technical Deep-Dives

- **[Architecture](ARCHITECTURE.md)** - System design and component interaction
- **[Context Engineering](CONTEXT_ENGINEERING.md)** - How we achieve 95% cost savings
- **[Parallel Execution](PARALLEL_EXECUTION.md)** - Automatic tool parallelization
- **[Memory System](MEMORY_TOOL.md)** - Persistent agent memory
- **[Metrics & Billing](METRICS_GUIDE.md)** - Usage tracking and cost management

### Feature Documentation

- **[Tool Reference](TOOL_REFERENCE.md)** - Complete tool catalog
- **[Web Search](WEB_SEARCH.md)** - Internet search capabilities
- **[Web Fetch](WEB_FETCH.md)** - HTTP requests and scraping
- **[Code Execution](CODE_EXECUTION.md)** - Running Python/Bash code
- **[Files API](FILES_API.md)** - Document upload and management

### Advanced Topics

- **[Long-Running Tasks](LONG_RUNNING_TASKS.md)** - Multi-hour agent operations
- **[Skills System](SKILLS.md)** - Extending agents with custom skills
- **[Dashboard](DASHBOARD_BUILD_SUMMARY.md)** - Web-based monitoring UI
- **[AGI Integration](AGI_INTEGRATION.md)** - Advanced AI capabilities

## 📊 Performance & Cost

### Real-World Metrics

Based on production usage across 1,000+ agent runs:

| Metric | Before Optimization | After Optimization | Improvement |
|--------|-------------------|-------------------|-------------|
| **Avg Tokens/Task** | 150,000 | 7,500 | **95% reduction** |
| **Avg Cost/Task** | $2.25 | $0.11 | **95% savings** |
| **Avg Task Duration** | 45s | 30s | **33% faster** |
| **Context Quality** | Degrades >50k tokens | EXCELLENT at 500k+ | **Indefinite** |
| **Parallel Speed** | Sequential (baseline) | 30-50% faster | **1.5x speedup** |

### Cost Breakdown (per 1M tokens)

| Operation | Input Cost | Output Cost | Typical Usage |
|-----------|-----------|-------------|---------------|
| Claude Opus 4.6 | $3.00 | $15.00 | Main model |
| Extended Thinking | $3.00 | $15.00 | Complex tasks only |
| Prompt Caching (hit) | $0.30 | $15.00 | 90% cost reduction |

**Pro Tip**: Enable prompt caching for system prompts to achieve an additional 90% savings on input tokens.

## 🔧 Configuration

### Environment Variables

```bash
# Required
ANTHROPIC_API_KEY=sk-ant-api03-...     # Claude API access
DISPLAY=:1                              # X11 display server

# Optional
STRIPE_API_KEY=sk_live_...             # Usage-based billing
WORKSPACE_PATH=/path/to/workspace       # Working directory
```

### Agent Configuration

Agents are configured via StateSet API or directly in `main.py`:

```python
AGENT_CONFIGS = {
    "AUTO_CLOSE": AgentConfig(
        agent_id="stateset_auto_close",
        agent_type="AUTO_CLOSE",
        name="Auto-Close Agent",
        description="Automatically closes support tickets",
        stripe_customer_id="cus_..."  # Optional: for billing
    ),
    # ... more agents
}
```

### Provider Selection

Support for multiple Claude providers:

```python
# Anthropic (default)
provider = APIProvider.ANTHROPIC

# AWS Bedrock
provider = APIProvider.BEDROCK

# Google Vertex AI
provider = APIProvider.VERTEX
```

[Read More: Configuration Guide →](USERGUIDE.md#configuration)

## 🛡️ Security Best Practices

### Critical Security Considerations

1. **Never commit API keys**: Use environment variables or `.env` files
   ```bash
   echo '.env' >> .gitignore
   export ANTHROPIC_API_KEY='...'
   ```

2. **Secure screenshot storage**: May contain sensitive user data
   ```bash
   chmod 700 screenshots/
   # Implement automatic cleanup policy
   ```

3. **Validate agent actions**: Use tool guard for risky operations
   ```python
   # Pre-execution validation in agent/tool_guard.py
   ```

4. **Agent memory isolation**: Separate storage per agent
   ```python
   # Memory stored in /tmp/agent_memories/{agent_id}/
   ```

5. **Prompt injection protection**: Sanitize user inputs
   ```python
   # Implemented in agent/tools/memory.py
   ```

[Read More: Security Considerations →](USERGUIDE.md#security-considerations)

## 🐛 Troubleshooting

### Common Issues

<details>
<summary><b>Display not found error</b></summary>

**Error**: `Error: Can't open display :1`

**Solution**:
```bash
# Start virtual display
Xvfb :1 -screen 0 1920x1080x24 &
export DISPLAY=:1

# Verify
xdpyinfo | grep dimensions
```
</details>

<details>
<summary><b>Import errors for anthropic/pyautogui</b></summary>

**Error**: `ModuleNotFoundError: No module named 'anthropic'`

**Solution**:
```bash
# Ensure virtual environment is activated
source venv/bin/activate

# Reinstall dependencies
pip install -r requirements.txt

# Verify installation
pip list | grep anthropic
```
</details>

<details>
<summary><b>API authentication failures</b></summary>

**Error**: `401 Unauthorized`

**Solution**:
```bash
# Verify API key format (starts with sk-ant-api03-)
echo $ANTHROPIC_API_KEY

# Test API connection
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-opus-4-6-20260131","max_tokens":1024,"messages":[{"role":"user","content":"test"}]}'
```
</details>

[Read More: Troubleshooting Guide →](GETTING_STARTED.md#troubleshooting)

## 🚧 Roadmap

### Implemented ✅

- [x] Multi-agent architecture with parallel execution
- [x] All 5 Anthropic context engineering patterns
- [x] Automatic tool parallelization (30-50% speedup)
- [x] Extended thinking support with effort control
- [x] Production monitoring and billing (Stripe)
- [x] Security hardening (prompt injection, path traversal, memory isolation)
- [x] Skills system for extensibility
- [x] Web dashboard with real-time SSE updates
- [x] Distributed tracing (OpenTelemetry integration)
- [x] Circuit breakers for API fault tolerance
- [x] Comprehensive exception hierarchy with typed errors
- [x] Subagent spawning for isolated task decomposition
- [x] MCP client integration (Slack, GitHub, Postgres, etc.)
- [x] Structured JSON output with schema validation
- [x] Stuck detection and recovery (repeating actions, cycling, stale loops)
- [x] Checkpoint system for resumable long-running tasks
- [x] Unified observability (logging, metrics, tracing, event streaming)
- [x] Tool search for deferred tool loading
- [x] Centralized configuration with YAML and env var support
- [x] 2,700+ unit tests

### Planned 📋

- [ ] Agent marketplace for community contributions
- [ ] Kubernetes deployment templates
- [ ] ML-based screenshot delay prediction
- [ ] Enhanced cost optimization (LZ4 compression)

[Read More: Future Improvements →](FUTURE_IMPROVEMENTS.md)

## 📈 Metrics & Monitoring

### Built-in Metrics

Every agent run captures comprehensive metrics:

```json
{
  "task_id": "task_20250324_142301",
  "agent_type": "AUTO_CLOSE",
  "duration_seconds": 32.5,
  "tokens_used": 8234,
  "estimated_cost": 0.12,
  "tools_executed": 15,
  "parallel_executions": 3,
  "success": true,
  "completion_indicators": ["ticket closed", "task finished"]
}
```

### Stripe Billing Integration

Automatic usage-based billing:

```python
# Meters configured at Stripe
meter_id = "computer_use_tokens"

# Events sent on task completion
{
  "event_name": "computer_use_tokens",
  "payload": {
    "stripe_customer_id": "cus_...",
    "value": 8234  # tokens used
  }
}
```

[Read More: Metrics Guide →](METRICS_GUIDE.md)

## 🤝 Contributing

This is proprietary software. For internal contributors:

1. Follow the [development guidelines](https://internal-docs.stateset.com/dev-guide)
2. All changes require review from 2+ team members
3. Ensure tests pass and coverage remains >80%
4. Update documentation for user-facing changes

## 📞 Support

### Resources

- **Documentation**: Start with [Getting Started Guide](GETTING_STARTED.md)
- **Bug Reports**: Create detailed issue reports with logs and screenshots
- **Feature Requests**: Submit to product team with use cases

### Contact

For support, contact the StateSet team:
- **Email**: support@stateset.com
- **Slack**: #computer-use-agents (internal)
- **Emergency**: On-call rotation (internal)

## 📄 License

This project is **proprietary software**. All rights reserved.

Unauthorized copying, modification, distribution, or use of this software is strictly prohibited.

For licensing inquiries, contact: legal@stateset.com

## 🙏 Acknowledgments

Built with:
- [Claude Opus 4.6](https://www.anthropic.com/claude) - Anthropic's most intelligent AI model
- [PyAutoGUI](https://pyautogui.readthedocs.io/) - Desktop automation
- [httpx](https://www.python-httpx.org/) - Modern HTTP client
- [FastAPI](https://fastapi.tiangolo.com/) - Dashboard backend
- [Next.js](https://nextjs.org/) - Dashboard frontend
- [OpenTelemetry](https://opentelemetry.io/) - Distributed tracing
- [Prometheus](https://prometheus.io/) - Metrics collection
- [Model Context Protocol](https://modelcontextprotocol.io/) - External tool integration
- Research from Anthropic's ["Effective Context Engineering"](https://www.anthropic.com/research)

---

<div align="center">

**StateSet Computer Use Agent**

Autonomous AI agents for the modern enterprise

[Documentation](GETTING_STARTED.md) • [Architecture](ARCHITECTURE.md) • [Support](mailto:support@stateset.com)

Made with ❤️ by the StateSet team

</div>
