Metadata-Version: 2.4
Name: website-agent-server
Version: 0.1.5
Summary: A server-side browser proxy that lets clients operate websites without direct remote connections.
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastapi>=0.115
Requires-Dist: uvicorn[standard]>=0.30
Requires-Dist: playwright>=1.48
Requires-Dist: python-multipart>=0.0.9
Dynamic: license-file

# Website Agent Server

English | [中文](README.zh-CN.md)

Website Agent Server is a Python server-side browser proxy. The client never loads the target website directly. It connects only to this server, receives rendered browser frames, and sends mouse, keyboard, input method, clipboard, wheel, file upload, and navigation events back to the server.

## How It Works

- FastAPI serves the local control UI, HTTP API, and WebSocket endpoint.
- Playwright launches Chromium on the server.
- The target website runs inside the server-side browser context.
- The client receives binary WebSocket image frames: JPEG screenshots by default, or PNG frames when `--screenshot-quality 100` is used.
- User actions are replayed into Chromium by the server.
- IME text, paste, copy, cut, downloads, file chooser actions, cookie management, and ordinary media element audio are brokered through local server endpoints.

Because the remote page is never embedded as HTML in the client, page scripts, link clicks, images, XHR/fetch calls, WebSocket connections, and form submissions are performed by the server-side browser.

## Setup

Create or reuse the repository-root virtual environment:

```powershell
python -m venv venv
venv\Scripts\python.exe -m pip install -r requirements.txt
venv\Scripts\python.exe -m playwright install chromium
```

## Run

```powershell
venv\Scripts\python.exe -m website_agent_server
```

Open [http://127.0.0.1:8000](http://127.0.0.1:8000), enter a website URL, and operate the remote site through the rendered viewport. By default the server listens on all interfaces, so LAN clients can also connect with the server machine's LAN IP.

If a target URL is entered without an `http://` or `https://` prefix, the server first probes HTTPS. It uses HTTPS when the TLS service is available, otherwise it falls back to HTTP. The same rule applies to `--lock-url` and `/lock_url/...` paths.

You can lock only one client by putting the target URL in the server URL path:

```text
http://127.0.0.1:8000/lock_url/https/example.com/path
```

That client opens the target immediately and hides the browser option controls. Other clients that open `/` keep the normal URL picker. Query strings and fragments are preserved, for example `/lock_url/https/example.com/path?x=1#section`.

## Command-Line Configuration

```powershell
venv\Scripts\python.exe -m website_agent_server --port 8080 --headed
```

Require a PIN before clients can use the proxy:

```powershell
venv\Scripts\python.exe -m website_agent_server --pin 123456
```

| Option | Default | Description |
| --- | --- | --- |
| `--host` | `0.0.0.0` | Server bind host. Use `127.0.0.1` to restrict access to this machine. |
| `--port` | `8000` | Server port. |
| `--headed` | disabled | Run Chromium with a visible browser window. |
| `--ignore-https-errors` | disabled | Ignore remote TLS certificate errors. |
| `--allow-private-hosts` | disabled | Allow navigation and resource requests to private, local, or reserved networks. |
| `--locale` | `zh-CN` | Browser locale exposed to remote sites. |
| `--timezone-id` | `Asia/Shanghai` | Browser timezone ID exposed to remote sites. |
| `--accept-language` | `zh-CN,zh;q=0.9,en;q=0.8` | `Accept-Language` header sent by browser contexts. |
| `--user-agent` | auto | Desktop browser User-Agent. By default the server derives a normal Chrome UA from the bundled Chromium version instead of exposing `HeadlessChrome`. |
| `--session-ttl-seconds` | `600` | Disconnected client session and client browser context lifetime. A client can reconnect to its cached browser session during this window. |
| `--shutdown-timeout-seconds` | `3.0` | Maximum graceful shutdown wait for active HTTP/WebSocket connections after Ctrl+C. |
| `--navigation-timeout-ms` | `30000` | Navigation timeout. |
| `--frame-interval-seconds` | `0.18` | Screenshot streaming interval. |
| `--screenshot-quality` | `95` | Frame quality from 1 to 100. Values below 100 use JPEG; 100 uses PNG. |
| `--media-frame-interval-seconds` | `0.35` | Screenshot streaming interval while remote media is playing. |
| `--media-screenshot-quality` | `80` | JPEG quality while remote media is playing. Ignored when `--screenshot-quality 100` enables PNG. |
| `--min-viewport-width` | `320` | Minimum remote viewport width. |
| `--min-viewport-height` | `240` | Minimum remote viewport height. |
| `--max-viewport-width` | `1920` | Maximum remote viewport width. |
| `--max-viewport-height` | `1600` | Maximum remote viewport height. |
| `--data-dir` | `.agent-data` | Runtime downloads and temporary uploads directory. |
| `--pin` | disabled | Require this PIN before clients can access the proxy UI, API, or WebSocket. |
| `--lock-url` | disabled | Open this URL automatically and hide/disable browser option controls such as Back, Forward, Cookie, Quit, and address navigation. PIN authentication still applies when configured. |

Private and local network targets are blocked by default to reduce SSRF risk. Use `--allow-private-hosts` only when you trust the users who can access the proxy.

Because LAN access is enabled by default, prefer using a PIN:

```powershell
venv\Scripts\python.exe -m website_agent_server --pin 123456
```

If the proxy itself also needs to open LAN or localhost target URLs, enable private hosts explicitly:

```powershell
venv\Scripts\python.exe -m website_agent_server --allow-private-hosts --pin 123456
```

Each client receives exactly one server-side Playwright `BrowserContext`, keyed only by its local `session-uuid` cookie. Contexts are never shared by IP address, target host, port, URL path, or device class. If the same client opens another target URL before its UUID expires, the old page is closed and the same context is reused, including storage partitions, service workers, permissions, and other browser context state.

Desktop clients use a regular Chrome-style User-Agent, locale, timezone, `Accept-Language`, and Client Hints by default. This improves compatibility with sites that reject obviously headless browser metadata, but it does not bypass account checks, rate limits, CAPTCHA, or other site access controls.

Mobile clients use a mobile Playwright browser profile with a narrow viewport, touch support, a mobile Chromium user agent, language headers, and mobile Client Hints so upstream responsive sites can select their mobile layout.

The local `session-uuid` cookie only identifies the Website Agent client, not the remote site. If the local page refreshes or the WebSocket drops, the server uses that cookie to reconnect the same client to its existing browser session. Disconnected browser sessions and idle client contexts are removed after `--session-ttl-seconds`, which is 10 minutes by default. When a UUID is removed, its BrowserContext, browsing history, cookies, localStorage, IndexedDB, download files, upload files, and in-memory session records are removed together.

Uvicorn's WebSocket ping keepalive is disabled because mobile browsers may suspend sockets while loading, switching apps, or sleeping. The client reconnect path and session TTL handle those drops without printing keepalive tracebacks.

Audio from ordinary server-side `<audio>` and `<video>` elements is captured in the page with `captureStream()` or a WebAudio fallback and forwarded over a dedicated WebSocket as WebM/Opus chunks, separate from screenshot frames. The server-side page is not relied on for audible playback. When remote media is playing, screenshot streaming automatically uses `--media-frame-interval-seconds` and `--media-screenshot-quality` to reduce CPU and network pressure. This does not cover DRM media, WebRTC calls, pure WebAudio graphs, browser UI sounds, or system-level mixed audio.

## Limitations

This project proxies interaction by streaming rendered browser frames, not by rewriting HTML. That keeps remote network access on the server, but it also means the client sees a bitmap viewport rather than native DOM nodes. Browser extension APIs, local client certificates, DRM-protected media, and some system dialogs are outside the current scope.
