Metadata-Version: 2.4
Name: www-search-mcp
Version: 1.0.2
Summary: MCP server providing web search (DuckDuckGo), HTTP fetch, browser fetch (Playwright), and file download.
Project-URL: Homepage, https://github.com/naifs/www-search-mcp
Project-URL: Repository, https://github.com/naifs/www-search-mcp
Project-URL: Issues, https://github.com/naifs/www-search-mcp/issues
Author-email: Naifs <naifs.rage@gmail.com>
License: MIT
License-File: LICENSE
Keywords: duckduckgo,httpx,mcp,playwright,www-search-mcp
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: ddgs>=9.11.3
Requires-Dist: httpx>=0.27.0
Requires-Dist: markdownify>=0.13.1
Requires-Dist: mcp>=1.26.0
Requires-Dist: playwright>=1.55.0
Description-Content-Type: text/markdown

# www-search-mcp

MCP (Model Context Protocol) server providing **web search**, **HTTP fetch**, **browser-based fetch** (Playwright), **file download**, and **package search** (PyPI, GitHub).

Gives AI assistants (Qoder, Claude Desktop, Cursor, etc.) the ability to search the web, read web pages, download files, and discover Python packages — all through a single MCP server with 7 tools.

## System Requirements

| Requirement | Version |
|---|---|
| **uv** | [Install guide](https://docs.astral.sh/uv/getting-started/installation/) |
| **Python** | 3.10+ (managed automatically by `uv`) |
| **Playwright** | Chromium browser (installed via post-install script) |

### Install uv

**macOS / Linux:**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

**Windows (PowerShell):**
```powershell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```

**macOS (Homebrew):**
```bash
brew install uv
```

## Installation

### Option 1: Run directly with `uvx` (recommended)

No clone needed. Runs from PyPI. Updates automatically on each run:

```bash
uvx www-search-mcp
```

MCP client config:

```json
{
  "mcpServers": {
    "www-search-mcp": {
      "command": "uvx",
      "args": ["www-search-mcp"]
    }
  }
}
```

### Option 2: Install as a `uv tool`

```bash
# From PyPI
uv tool install www-search-mcp

# After installation, the command is available globally:
www-search-mcp
```

To update or reinstall:

```bash
uv tool upgrade www-search-mcp
# or force reinstall latest:
uv tool install --force www-search-mcp@latest
```

MCP client config (global tool):

```json
{
  "mcpServers": {
    "www-search-mcp": {
      "command": "www-search-mcp"
    }
  }
}
```

### Option 3: Run from local source (for development)

```bash
git clone https://github.com/naifs/www-search-mcp.git
cd www-search-mcp
uv sync
uv run www-search-mcp
```

To update:

```bash
git pull && uv sync
```

MCP client config (local source):

```json
{
  "mcpServers": {
    "www-search-mcp": {
      "command": "uv",
      "args": [
        "run",
        "--project",
        "/absolute/path/to/www-search-mcp",
        "www-search-mcp"
      ]
    }
  }
}
```

### Option 4: Install from built wheel

```bash
cd /path/to/www-search-mcp
uv build
uv tool install dist/*.whl
```

To update:

```bash
uv build && uv tool install --force dist/*.whl
```

> **Note:** Playwright Chromium is installed automatically on first use when a browser tool is called. If auto-install fails (e.g. no network), run manually:
> ```bash
> uv run python -m playwright install chromium
> ```

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `WEB_TIMEOUT` | HTTP request timeout in seconds | `30` |
| `WEB_MAX_RESULTS` | Default max search results per query (1..10) | `5` |
| `WEB_MAX_FETCH_CHARS` | Max characters returned in fetch body | `200000` |
| `WEB_RETRIES` | Retry attempts on timeout/rate-limit (0..5) | `2` |
| `WEB_MIN_INTERVAL` | Minimum seconds between outbound requests (throttle) | `1.0` |
| `WEB_MAX_DOWNLOAD_BYTES` | Max bytes per downloaded file | `50000000` |
| `WEB_DEBUG` | Enable debug logging (`1`/`true`/`yes`/`on`) | `false` |
| `WEB_SESSION_ENABLED` | Enable persistent cookies/session by default | `false` |
| `WEB_PROXY` | HTTP proxy URL (e.g. `http://proxy:8080`) | — |
| `WEB_USER_AGENT` | Custom User-Agent string for HTTP requests | Chrome 135 UA |

## Provided Tools

### Search Tools

- `web_search`
  - Input:
    - `query: str`
    - `max_results: int = 5` (1..10)
  - Output:
    - `status`, `query`, `result_count`
    - `results[]` with `title`, `url`, `snippet`
  - Note: safe search is always disabled

- `web_search_images`
  - Input:
    - `query: str`
    - `max_results: int = 5` (1..10)
  - Output:
    - `status`, `query`, `result_count`
    - `results[]` with `title`, `image`, `url`, `thumbnail`, `height`, `width`, `source`
  - Note: safe search is always disabled

- `web_search_github`
  - Input:
    - `query: str`
    - `max_results: int = 5` (1..10)
  - Output:
    - `status`, `query`, `result_count`
    - `results[]` with `title`, `url`, `description`, `stars`, `forks`, `language`
  - Note: uses GitHub REST API (no token needed, but rate-limited to ~10 req/min)

- `web_search_pypi`
  - Input:
    - `query: str`
    - `max_results: int = 5` (1..10)
  - Output:
    - `status`, `query`, `result_count`
    - `results[]` with `name`, `version`, `summary`, `author`, `license`, `requires_python`, `url`, `repository`, `py_versions`, `dependencies`
  - Note: uses DuckDuckGo discovery + PyPI JSON API for enriched metadata

### Fetch & Download Tools

- `web_fetch`
  - Input:
    - `url: str` (`http`/`https` only)
    - `fetch_div: str = ""` — optional CSS selector (e.g. `article`, `.post-body`)
    - `save_file: str = ""` — optional absolute file path with extension
    - `use_session: bool = False` — reuse cookies from previous requests
  - Output:
    - `status`, `http_status`, `url`, `truncated`, `title?`, `content_type?`, `body` (or `saved_to`/`bytes_written` when `save_file` is used)

- `web_fetch_browser`
  - Input:
    - `url: str` (`http`/`https` only)
    - `fetch_div: str = ""` — optional CSS selector
    - `save_file: str = ""` — optional absolute file path with extension
    - `headless: bool = True` — show browser window or run hidden
    - `wait_seconds: int = 0` — extra wait after page load
    - `use_session: bool = False` — reuse browser cookies/context
  - Output: same as `web_fetch` plus `title`
  - Use for JS-heavy pages or sites that block plain HTTP clients

- `web_download`
  - Input:
    - `url: str` (`http`/`https` only)
    - `save_file: str` — required absolute file path with extension
    - `use_session: bool = False` — reuse cookies from previous requests
  - Output:
    - `status`, `url`, `saved_to`, `bytes`, `content_type`

## Quick Verification

```bash
# Search the web
uv run python -c "from www_search_mcp.server import web_search; r=web_search('python mcp protocol', max_results=3); print(r['status'], r['result_count'])"

# Fetch a page
uv run python -c "import asyncio; from www_search_mcp.server import web_fetch; r=asyncio.run(web_fetch('https://example.com')); print(r['status'], r['http_status'])"

# Search GitHub
uv run python -c "from www_search_mcp.server import web_search_github; r=web_search_github('fastapi', max_results=3); print(r['status'], r['result_count'])"

# Search PyPI
uv run python -c "from www_search_mcp.server import web_search_pypi; r=web_search_pypi('httpx', max_results=3); print(r['status'], r['result_count'])"
```

## Troubleshooting

### `uv` not found
Install `uv` and reopen your terminal. See [System Requirements](#system-requirements).

### Dependencies missing
```bash
uv sync
```

### Playwright browser not found
```bash
uv run python -m playwright install chromium
```

### GitHub API rate limit exceeded
The GitHub API allows ~10 requests/minute without authentication. To increase the limit, set a GitHub token:
```bash
export WEB_GITHUB_TOKEN=ghp_your_token_here
```

### Binary content error in `web_fetch`
`web_fetch` rejects binary content (images, PDFs, etc.). Use `web_download` instead to save binary files to disk.

### MCP tools not appearing in client
1. Check that the MCP client config JSON is valid.
2. Ensure the `--project` path is absolute and correct.
3. Reload the MCP client after config changes.
4. Check `WEB_DEBUG=true` for detailed logs.

### Wrong project path in config
The `--project` argument must point to the **root directory** of `www-search-mcp` (where `pyproject.toml` is located), not to the `src/` subdirectory.

## License

MIT
