Metadata-Version: 2.4
Name: orion_browser
Version: 0.2.0
Summary: 浏览器代理服务器
Home-page: https://github.com/bantouyan/orion
Author: BTY Team
Author-email: qianhai@bantouyan.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: aiohappyeyeballs==2.4.8
Requires-Dist: aiohttp==3.11.18
Requires-Dist: aiosignal==1.3.2
Requires-Dist: annotated-types==0.7.0
Requires-Dist: anthropic==0.49.0
Requires-Dist: anyio==4.8.0
Requires-Dist: argparse==1.4.0
Requires-Dist: attrs==25.1.0
Requires-Dist: babel==2.17.0
Requires-Dist: backoff==2.2.1
Requires-Dist: bashlex==0.18
Requires-Dist: beautifulsoup4==4.13.3
Requires-Dist: boto3==1.37.6
Requires-Dist: botocore==1.37.6
Requires-Dist: cachetools==5.5.2
Requires-Dist: certifi==2025.1.31
Requires-Dist: charset-normalizer==3.4.1
Requires-Dist: click==8.1.8
Requires-Dist: courlan==1.3.2
Requires-Dist: dateparser==1.2.1
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: defusedxml==0.7.1
Requires-Dist: Deprecated==1.2.18
Requires-Dist: distro==1.9.0
Requires-Dist: fastapi==0.115.11
Requires-Dist: filetype==1.2.0
Requires-Dist: fireworks-ai==0.15.12
Requires-Dist: frozenlist==1.5.0
Requires-Dist: google-ai-generativelanguage==0.6.15
Requires-Dist: google-api-core==2.24.1
Requires-Dist: google-api-python-client==2.162.0
Requires-Dist: googleapis-common-protos==1.69.0
Requires-Dist: google-auth==2.38.0
Requires-Dist: google-auth-httplib2==0.2.0
Requires-Dist: google-generativeai==0.8.4
Requires-Dist: greenlet==3.1.1
Requires-Dist: grpcio==1.67.1
Requires-Dist: grpcio-status==1.67.1
Requires-Dist: h11==0.14.0
Requires-Dist: html2text==2024.2.26
Requires-Dist: htmldate==1.9.3
Requires-Dist: httpcore==1.0.7
Requires-Dist: httplib2==0.22.0
Requires-Dist: httpx==0.28.1
Requires-Dist: httpx-sse==0.4.0
Requires-Dist: httpx-ws==0.7.1
Requires-Dist: idna==3.10
Requires-Dist: importlib-metadata==8.5.0
Requires-Dist: jiter==0.8.2
Requires-Dist: jmespath==1.0.1
Requires-Dist: jsonpatch==1.33
Requires-Dist: jsonpointer==3.0.0
Requires-Dist: justext==3.0.2
Requires-Dist: langchain==0.3.14
Requires-Dist: langchain_anthropic==0.3.3
Requires-Dist: langchain_aws==0.2.15
Requires-Dist: langchain_core==0.3.41
Requires-Dist: langchain_fireworks==0.2.7
Requires-Dist: langchain_google_genai==2.0.8
Requires-Dist: langchain_ollama==0.2.2
Requires-Dist: langchain_openai==0.3.1
Requires-Dist: langchain_text_splitters==0.3.6
Requires-Dist: langsmith==0.2.11
Requires-Dist: lmnr==0.4.62
Requires-Dist: lxml==5.3.1
Requires-Dist: MainContentExtractor==0.0.4
Requires-Dist: markdownify==0.14.1
Requires-Dist: monotonic==1.6
Requires-Dist: multidict==6.1.0
Requires-Dist: numpy==1.26.4
Requires-Dist: ollama==0.4.7
Requires-Dist: openai==1.65.3
Requires-Dist: opentelemetry-api==1.30.0
Requires-Dist: opentelemetry-exporter-otlp-proto-common==1.30.0
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc==1.30.0
Requires-Dist: opentelemetry-exporter-otlp-proto-http==1.30.0
Requires-Dist: opentelemetry-instrumentation==0.51b0
Requires-Dist: opentelemetry-instrumentation-langchain==0.38.10
Requires-Dist: opentelemetry-instrumentation-requests==0.51b0
Requires-Dist: opentelemetry-instrumentation-sqlalchemy==0.51b0
Requires-Dist: opentelemetry-instrumentation-threading==0.51b0
Requires-Dist: opentelemetry-instrumentation-urllib3==0.51b0
Requires-Dist: opentelemetry-proto==1.30.0
Requires-Dist: opentelemetry-sdk==1.30.0
Requires-Dist: opentelemetry-semantic-conventions==0.51b0
Requires-Dist: opentelemetry-semantic-conventions-ai==0.4.3
Requires-Dist: opentelemetry-util-http==0.51b0
Requires-Dist: orjson==3.10.15
Requires-Dist: packaging==24.2
Requires-Dist: pexpect==4.9.0
Requires-Dist: pillow==11.1.0
Requires-Dist: playwright==1.50.0
Requires-Dist: posthog==3.18.1
Requires-Dist: propcache==0.3.0
Requires-Dist: protobuf==5.29.3
Requires-Dist: proto-plus==1.26.0
Requires-Dist: ptyprocess==0.7.0
Requires-Dist: pyasn1==0.6.1
Requires-Dist: pyasn1-modules==0.4.1
Requires-Dist: pydantic==2.10.6
Requires-Dist: pydantic-core==2.27.2
Requires-Dist: pyee==12.1.1
Requires-Dist: pyparsing==3.2.1
Requires-Dist: python-dotenv==1.0.1
Requires-Dist: pytz==2025.1
Requires-Dist: PyYAML==6.0.2
Requires-Dist: regex==2024.11.6
Requires-Dist: requests==2.32.3
Requires-Dist: requests-toolbelt==1.0.0
Requires-Dist: rsa==4.9
Requires-Dist: s3transfer==0.11.4
Requires-Dist: six==1.17.0
Requires-Dist: sniffio==1.3.1
Requires-Dist: soupsieve==2.6
Requires-Dist: SQLAlchemy==2.0.38
Requires-Dist: starlette==0.46.0
Requires-Dist: tenacity==9.0.0
Requires-Dist: tiktoken==0.9.0
Requires-Dist: tld==0.13
Requires-Dist: tqdm==4.67.1
Requires-Dist: trafilatura==2.0.0
Requires-Dist: typing-extensions==4.12.2
Requires-Dist: tzlocal==5.3
Requires-Dist: uritemplate==4.1.1
Requires-Dist: urllib3==2.3.0
Requires-Dist: uvicorn==0.34.0
Requires-Dist: websockets==15.0
Requires-Dist: wrapt==1.17.2
Requires-Dist: wsproto==1.2.0
Requires-Dist: yarl==1.18.3
Requires-Dist: zipp==3.21.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Orion Sandbox

Orion是一个提供终端模拟器和浏览器代理功能的服务器。它可以作为独立服务运行，也可以作为Python包导入使用。

## 功能特点

* https://gist.github.com/jlia0/db0a9695b3ca7609c9b1a08dcbf872c9

## 安装

Orion Sandbox (this repo) is a container-based environment that provides a secure, isolated space for AI agents (particularly LLMs like Claude) to interact with terminal environments and web browsers. It acts as a bridge between the AI system and computing resources, allowing the AI to execute real-world tasks like:

- Running terminal commands
- Automating browser actions
- Managing files and directories
- Editing text files

This sandbox creates a controlled environment where AI systems can safely perform actions without having direct access to the host system.

## Architecture

```
┌───────────────────────────┐                ┌─────────────────┐      ┌────────────────────────────────────────────┐
│                           │                │                 │      │              Sandbox Container             │
│    AI Agent (e.g. Claude) │                │  API Proxy      │      │                                            │
│                           │                │                 │      │ ┌──────────┐  ┌─────────┐  ┌────────────┐  │
│         Orion             │  API Requests  │  - Auth check   │      │ │          │  │         │  │            │  │
│                           │◄──────────────►│  - Rate limiting├─────►│ │ Terminal │  │ Browser │  │ File/Text  │  │
│                           │  & Responses   │  - Routing      │      │ │ Service  │  │ Service │  │ Operations │  │
│                           │                │                 │      │ │          │  │         │  │            │  │
│                           │                │                 │      │ └────┬─────┘  └────┬────┘  └─────┬──────┘  │
└───────────────────────────┘                └─────────────────┘      │      │             │             │         │
                                             x-sandbox-token          │      │             │             │         │
                                             authentication           │      v             v             v         │
                                                                      │ ┌──────────────────────────────────────┐   │
                                                                      │ │               FastAPI                │   │
                                                                      │ │      (app/server.py + router.py)     │   │
                                                                      │ └──────────────────────────────────────┘   │
                                                                      │                                            │
                                                                      └────────────────────────────────────────────┘
```

## Key Components

1. **AI Agent**: The LLM (e.g., Claude) that sends API requests to the sandbox to perform tasks.

2. **API Proxy**: An intermediary service (`https://api.manus.im/apiproxy.v1.ApiProxyService/CallApi`) that:
   - Authenticates requests using the `x-sandbox-token` header
   - Routes requests to the appropriate sandbox instance
   - Handles rate limiting and access control

3. **Sandbox Container**: A Docker container that isolates the execution environment and provides:
   - FastAPI server (`app/server.py`) - The main entry point for HTTP requests
   - WebSocket server (`app/terminal_socket_server.py`) - For real-time terminal interaction
   - File and text editing capabilities (`app/tools/text_editor.py`)

4. **browser_use Library**: A modified version of the browser-use library that:
   - Provides browser automation via Playwright
   - Has been specifically adapted to work with Claude API (via `browser_use/agent/service.py`)
   - Handles browser actions, DOM interactions, and browser session management

## browser_use Integration

The browser_use library is a key component of Orion Sandbox that enables browser automation. It provides a clean API for the AI to interact with web browsers programmatically.

It is MIT licensced although the liscence was missing from the original source code.

### Key Classes and Components:

#### Agent Class (browser_use/agent/service.py)

The `Agent` class is the main entry point for browser automation. It handles:

- Initializing browser sessions
- Processing LLM outputs into actions
- Managing state history
- Handling errors and retries

```python
class Agent:
    def __init__(
        self,
        task: str,
        llm: BaseChatModel,
        browser: Browser | None = None,
        # Many other parameters...
    ):
        # Initialize all components
        
    async def run(self, max_steps: int = 100) -> AgentHistoryList:
        # Main execution loop
        # Process LLM outputs and execute actions
```

#### Browser Context (browser_use/browser/context.py)

The `BrowserContext` class manages the browser state and provides methods for interacting with web pages:

```python
class BrowserContext:
    async def navigate_to(self, url: str):
        """Navigate to a URL"""
        
    async def click_element(self, index: int):
        """Click an element using its index"""
        
    async def input_text_to_element(self, index: int, text: str, delay: float = 0):
        """Input text into an element"""
```

#### System Prompts (browser_use/agent/prompts.py)

The `SystemPrompt` class defines the instructions given to the LLM about how to interact with the browser:

```python
class SystemPrompt:
    def important_rules(self) -> str:
        """
        Returns the important rules for the agent.
        """
        rules = """
1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
   {
     "current_state": {
        "page_summary": "Quick detailed summary of new information from the current page which is not yet in the task history memory. Be specific with details which are important for the task. This is not on the meta level, but should be facts. If all the information is already in the task history memory, leave this empty.",
        "evaluation_previous_goal": "Success|Failed|Unknown - Analyze the current elements and the image to check if the previous goals/actions are successful like intended by the task. Ignore the action result. The website is the ground truth. Also mention if something unexpected happened like new suggestions in an input field. Shortly state why/why not",
       "memory": "Description of what has been done and what you need to remember. Be very specific. Count here ALWAYS how many times you have done something and how many remain. E.g. 0 out of 10 websites analyzed. Continue with abc and xyz",
       "next_goal": "What needs to be done with the next actions"
     },
     "action": [
       {
         "one_action_name": {
           // action-specific parameter
         }
       },
       // ... more actions in sequence
     ]
   }
        """
        # More rules follow...
        return rules
```

The prompt instructs the LLM on:

- How to format its responses (JSON structure)
- Rules for interacting with browser elements
- Navigation and error handling
- Task completion criteria
- Element interaction guidelines

#### Controller Registry (browser_use/controller/registry/service.py)

The `Registry` class provides a way to register and execute actions:

```python
class Registry:
    def action(
        self,
        description: str,
        param_model: Optional[Type[BaseModel]] = None,
    ):
        """Decorator for registering actions"""
        
    async def execute_action(
        self,
        action_name: str,
        params: dict,
        browser: Optional[BrowserContext] = None,
        # Other parameters
    ) -> Any:
        """Execute a registered action"""
```

## How AI-Sandbox Communication Works

The communication between an AI agent (like Claude) and the sandbox follows this flow:

1. **AI Agent Formulates a Request**:
   - The AI decides on an action to perform (e.g., run a terminal command, navigate a browser)
   - It constructs an appropriate API request following the sandbox API specification

2. **Request Transmission**:
   - The AI sends an HTTP request to either:
     - Directly to the sandbox container (if exposed)
     - Through an API proxy service (`https://api.manus.im/apiproxy.v1.ApiProxyService/CallApi`)

3. **Authentication**:
   - The request includes an API token (`x-sandbox-token` header)
   - The token is verified against the value stored in `$HOME/.secrets/sandbox_api_token`

4. **Request Processing**:
   - The sandbox FastAPI server receives and processes the request
   - It routes the request to the appropriate service (terminal, browser, file operations)
   - The requested action is performed within the isolated container environment

5. **Response Return**:
   - Results of the action are formatted as JSON or binary data (for file downloads)
   - The response is sent back to the AI agent

6. **Real-time Communication** (for terminal):
   - Terminal sessions use WebSockets for bidirectional, real-time communication
   - The AI can receive terminal output as it's generated and send new commands

### Example Flow: AI Running a Shell Command

```
┌─────────────┐                 ┌───────────────┐              ┌──────────────────┐
│             │ 1. HTTP Request │               │ 2. Route to  │                  │
│  AI Agent   │────────────────►│ Sandbox API   │─────────────►│ Terminal Service │
│             │                 │ (FastAPI)     │              │                  │
│             │◄────────────────│               │◄─────────────│                  │
└─────────────┘ 4. JSON Response└───────────────┘ 3. Execute   └──────────────────┘
                                                    Command
```

## API Client Usage

The sandbox includes a Python API client (`data_api.py`) that communicates with the proxy service:

```python
from data_api import ApiClient

# Initialize the client
api_client = ApiClient()

# Call a terminal command
response = api_client.call_api(
    "terminal_execute",
    body={
        "command": "ls -la",
        "terminal_id": "main"
    }
)

print(response)
```

## LLM Response Format for Browser Automation

When interacting with browser_use, the LLM (like Claude) must format its responses as JSON according to the schema defined in the system prompt:

```json
{
  "current_state": {
    "page_summary": "Found search page with 10 results for 'electric cars'",
    "evaluation_previous_goal": "Success - successfully navigated to search page and performed search as intended",
    "memory": "Completed search for 'electric cars'. Need to extract information from first 3 results (0 of 3 done)",
    "next_goal": "Extract detailed information from first search result"
  },
  "action": [
    {
      "click_element": {
        "index": 12
      }
    }
  ]
}
```

This response structure allows the Agent to:

1. Track the LLM's understanding of the current page
2. Evaluate the success of previous actions
3. Maintain memory across interactions
4. Execute the next action(s)

## Available Browser Actions

The browser_use library provides a wide range of actions for web automation:

### Navigation Actions

- `go_to_url`: Navigate to a specific URL
- `search_google`: Perform a Google search
- `go_back`: Navigate back in browser history
- `open_tab`: Open a new browser tab
- `switch_tab`: Switch between browser tabs

### Element Interaction

- `click_element`: Click on a page element by its index
- `input_text`: Type text into a form field
- `scroll_down`/`scroll_up`: Scroll the page
- `scroll_to_text`: Scroll to find specific text
- `select_dropdown_option`: Select from dropdown menus

### Content Extraction

- `extract_content`: Extract and process page content
- `get_dropdown_options`: Get all options from a dropdown

### Task Completion

- `done`: Mark the task as complete and return results

## Integration with LLM Systems

To integrate an LLM with this sandbox:

1. **API Client Implementation**: Create an API client in the LLM's execution environment

2. **Task Planning**: The LLM should break down user requests into specific API calls

3. **Sequential Operations**: Complex tasks often require multiple API calls in sequence

4. **Error Handling**: The LLM should interpret error responses and adjust its approach

5. **State Management**: For multi-step operations, the LLM needs to track the state of the environment

## Example Workflow: LLM Using the Sandbox

1. User asks the LLM to "Create a Python script that fetches weather data and save it"

2. LLM plans the steps:
   - Create a new Python file
   - Write the code to fetch weather data
   - Save the file
   - Run the script to test it
   - Show the results to the user

3. LLM executes each step by making API calls to the sandbox:
   - `POST /text_editor` with `command: "create"` to create a new file
   - `POST /text_editor` with `command: "write"` to write the code
   - `POST /terminal/{id}/write` to run the script
   - `GET /terminal/{id}` to get the output
   - Return the results to the user

## Security Considerations

1. **Multi-layered Authentication**:

   - API token authentication using the `x-sandbox-token` header (NOT IMPLEMENTED IN THIS CODE)
   - Token verification happens at the proxy layer before requests reach the FastAPI application  (NOT IMPLEMENTED IN THIS CODE)
   - Tokens are stored securely in `$HOME/.secrets/sandbox_api_token`

2. **Proxy Service Protection**:
   - The proxy service provides an additional layer of security
   - Acts as a gatekeeper for all requests to the sandbox
   - Can implement rate limiting, request validation, and access control

3. **Isolation**:
   - The Docker container provides isolation from the host system
   - Prevents the AI from affecting the host machine directly

4. **Resource Limitations**:

   - The sandbox can be configured with resource constraints (CPU, memory) at the Docker level
   - Prevents resource exhaustion attacks

5. **Action Restrictions**:

   - The API can be configured to restrict certain dangerous operations
   - Browser automation is contained within the sandbox environment

## Deployment with Docker

The sandbox is designed to run in a Docker container. The provided Dockerfile was not in the original code but gives an idea of what the container could look like:

1. A Python 3.12 environment
2. Chromium browser for web automation
3. All necessary dependencies
4. API token initialization

To build and run the container:

```bash
# Build the container
docker build -t orion-sandbox .

# Run the container
docker run -p 8080:8080 orion-sandbox
```

### 方法2: 从源码安装（推荐用于开发）

```bash
git clone https://github.com/yourusername/orion.git
cd orion
pip install -e .
```

## 使用方法

### 作为服务使用

#### 方法1: 使用命令行工具

安装后，可以直接使用命令行工具启动服务：

```bash
# 使用默认配置启动
orion-server

# 指定端口和日志级别
orion-server --port 8888 --log-level debug

# 开发模式（自动重载）
orion-server --reload
```

#### 方法2: 使用Python脚本启动

```bash
python start_server.py --port 8330
```

### 作为库导入使用

可以将Orion作为Python库导入使用，示例代码：

```python
import asyncio
from app import BrowserManager, terminal_manager, text_editor

# 初始化浏览器管理器
async def browser_example():
    browser = BrowserManager(headless=False)
    await browser.initialize()
    # 执行浏览器操作...
    await browser.close()

# 使用终端管理器
async def terminal_example():
    terminal = await terminal_manager.create_or_get_terminal("my_terminal")
    await terminal.execute_command("ls -la")
    history = terminal.get_history(True, True)
    # 处理终端输出...

# 运行示例
asyncio.run(browser_example())
```

更多示例请参考 `examples/use_as_package.py`。

## Docker部署

```bash
# 构建容器
docker build -t orion-server .

# 运行容器
docker run -p 8330:8330 orion-server
```

## API文档

启动服务后，访问 `http://localhost:8330/docs` 查看API文档。

## 许可证

MIT
