Metadata-Version: 2.4
Name: lionscraper
Version: 1.0.2
Summary: LionScraper bridge daemon, thin MCP stdio, and CLI — local HTTP + WebSocket to Chrome extension (Python)
Author: LionScraper
License: MIT
Keywords: lionscraper,mcp,cli,web-scraping,chrome-extension
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: pydantic>=2.6.0
Requires-Dist: mcp>=1.8.0
Requires-Dist: websockets>=12.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.24.0; extra == "dev"

# LionScraper (Python)

Python edition of **LionScraper**: a local **HTTP + WebSocket** daemon that talks to the **Chrome/Edge LionScraper extension**, plus a **thin MCP server** over **stdio** that forwards tool calls to the daemon. Behavior is intended to match the npm/`packages/node` implementation in this repository.

## Requirements

- Python **3.10+**
- The LionScraper browser extension installed and configured to use the same **PORT** as the daemon (default **13808**).

## Install

From PyPI (when published):

```bash
pip install lionscraper
```

From a checkout of this repo:

```bash
cd packages/python
pip install -e ".[dev]"
```

## Commands

| Command | Role |
|--------|------|
| `lionscraper` | Full CLI (`daemon`, `stop`, `scrape`, `ping`, …). |
| `lionscraper-mcp` | If **no** extra arguments: thin MCP over stdio. **Any** extra argument delegates to the same CLI as `lionscraper`. |

Equivalent module entry:

```bash
python -m lionscraper --help
python -m lionscraper daemon
```

## MCP client configuration

Use **`lionscraper-mcp`** as the MCP server command (no arguments) so the host spawns thin MCP stdio:

```json
{
  "mcpServers": {
    "lionscraper": {
      "command": "lionscraper-mcp",
      "env": {
        "PORT": "13808",
        "LANG": "en-US"
      }
    }
  }
}
```

To run the same routing as `lionscraper-mcp` via Python (stdio MCP when there are **no** extra arguments after the module name):

```json
"command": "python",
"args": ["-m", "lionscraper"]
```

Thin stdio mode is selected only when `sys.argv` after the program entry has **length 0** (same as Node). Any extra token (including `--debug`) selects the CLI path instead.

## Environment variables

| Variable | Meaning |
|----------|---------|
| `PORT` | HTTP + WebSocket listen port (default **13808**). Must match the extension bridge port. |
| `TOKEN` | Optional bearer token for `Authorization` on loopback HTTP. |
| `DAEMON` | Set to `0` to disable auto-start of the daemon from CLI / thin MCP. |
| `TIMEOUT` | Default timeout hints (see root repo docs). |
| `LANG` | Locale for log and tool metadata (`en-US` / `zh-CN`). |

## PyPI release (maintainers)

```bash
cd packages/python
python -m pip install build twine
python -m build
python -m twine upload dist/*
```

Project name on PyPI: **`lionscraper`** (single package providing both console scripts).

## Parity with Node

This package mirrors `packages/node`: bridge protocol, daemon HTTP API (`/v1/health`, `/v1/daemon/shutdown`, `/v1/tools/call` with optional NDJSON progress), port probing, and tool input validation (Pydantic vs Zod). Locale JSON is copied under `src/locale/` so the Python wheel does not depend on the Node tree.

## Development tests

```bash
cd packages/python
pytest
```

## Manual smoke test

1. Start the daemon: `lionscraper daemon` (or let CLI / `lionscraper-mcp` auto-spawn with default `DAEMON`).
2. Connect the extension to `ws://127.0.0.1:<PORT>` (same as HTTP).
3. `curl -s http://127.0.0.1:13808/v1/health` — expect JSON with `"ok": true` and `identity` **lionscraper** when ready.
4. Run `lionscraper-mcp` with no args under an MCP host and list tools — expect `ping`, `scrape`, `scrape_article`, etc.

Chinese documentation: [README_cn.md](README_cn.md).
