Metadata-Version: 2.4
Name: mcpcn-web-search-mcp
Version: 0.1.0
Summary: Multi-source web search MCP server with automatic failover
Author-email: Your Name <you@example.com>
License: MIT
Keywords: duckduckgo,mcp,serpapi,web-search,wikipedia
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: requests>=2.28.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# Web Search MCP Server

A multi-source web search MCP server with automatic failover support.

## Features

- **Multiple Search Providers**: SerpAPI, DuckDuckGo Lite, Google News RSS, Wikipedia, Exa MCP
- **Automatic Failover**: If one provider fails, automatically tries the next
- **Language Detection**: Automatically adjusts search parameters for CJK (Chinese/Japanese/Korean) queries
- **Flexible Output**: Text (human-readable) or JSON (structured) formats
- **Webpage Fetching**: Extract readable content from URLs

## Installation

```bash
# From source
pip install -e .

# Or with uv
uv pip install -e .
```

## Configuration

### Environment Variables

| Variable | Description | Required |
|----------|-------------|----------|
| `WEB_SEARCH_SOURCES` | Comma-separated provider priority (e.g., `serpapi,ddg_lite,wikipedia`) | No |
| `WEB_SEARCH_PROVIDER` | Single provider override | No |
| `SERPAPI_API_KEY` | API key for [SerpAPI](https://serpapi.com/) | For SerpAPI |
| `EXA_API_KEY` | API key for [Exa](https://exa.ai/) | For Exa |
| `EXA_MCP_URL` | Exa MCP endpoint (default: `https://mcp.exa.ai/mcp`) | No |

### Available Sources

| Source | Description | API Key Required |
|--------|-------------|------------------|
| `serpapi` | Multi-engine search (Google, Bing, Baidu, etc.) | Yes (`SERPAPI_API_KEY`) |
| `serpapi_google` | SerpAPI with Google engine | Yes |
| `serpapi_bing` | SerpAPI with Bing engine | Yes |
| `serpapi_baidu` | SerpAPI with Baidu engine | Yes |
| `ddg_lite` | DuckDuckGo Lite (free) | No |
| `google_news_rss` | Google News RSS feed | No |
| `wikipedia` | Wikipedia OpenSearch | No |
| `exa_mcp` | Exa AI-powered search | Yes (`EXA_API_KEY`) |

### Default Priority

- If `SERPAPI_API_KEY` is set: `serpapi → ddg_lite → google_news_rss`
- If `EXA_API_KEY` is set: `exa_mcp → ddg_lite → google_news_rss → wikipedia`
- Otherwise: `ddg_lite → google_news_rss → wikipedia`

## Usage

### Running the Server

```bash
# Direct execution
web-search-mcp

# Or via Python
python -m web_search_mcp.server
```

### Claude Desktop Configuration

Add to your `claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "web-search": {
      "command": "web-search-mcp",
      "env": {
        "SERPAPI_API_KEY": "your-serpapi-key",
        "WEB_SEARCH_SOURCES": "serpapi,ddg_lite,wikipedia"
      }
    }
  }
}
```

Or with `uv`:

```json
{
  "mcpServers": {
    "web-search": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-search-mcp",
        "run",
        "web-search-mcp"
      ],
      "env": {
        "WEB_SEARCH_SOURCES": "ddg_lite,google_news_rss,wikipedia"
      }
    }
  }
}
```

## Tools

### web_search

Search the internet using multiple providers with automatic failover.

**Parameters:**

| Name | Type | Default | Description |
|------|------|---------|-------------|
| `query` | string | required | The search query |
| `max_results` | int | 10 | Maximum results (1-20) |
| `format` | string | "text" | Output format: "text" or "json" |
| `sources` | string | "" | Comma-separated source list |
| `engine` | string | "" | SerpAPI engine (google, bing, baidu, etc.) |

**Example:**

```python
# Text format (default)
web_search(query="Python tutorials", max_results=5)

# JSON format
web_search(query="latest AI news", format="json")

# Specific sources
web_search(query="machine learning", sources="wikipedia,ddg_lite")

# Baidu search via SerpAPI
web_search(query="人工智能", sources="serpapi", engine="baidu")
```

### fetch_webpage

Fetch and extract readable content from a URL.

**Parameters:**

| Name | Type | Default | Description |
|------|------|---------|-------------|
| `url` | string | required | The URL to fetch |
| `max_length` | int | 8000 | Maximum content length |

**Example:**

```python
fetch_webpage(url="https://example.com/article")
```

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    MCP Server (FastMCP)                      │
├─────────────────────────────────────────────────────────────┤
│  web_search()                    fetch_webpage()             │
│      │                                 │                     │
│      ▼                                 ▼                     │
│  MultiSourceSearcher              requests + BeautifulSoup   │
│      │                                                       │
│      ├──► SerpAPIProvider (paid, multi-engine)              │
│      ├──► ExaMCPProvider (paid, AI-powered)                 │
│      ├──► DuckDuckGoLiteProvider (free)                     │
│      ├──► GoogleNewsRSSProvider (free, news-focused)        │
│      └──► WikipediaProvider (free, encyclopedia)            │
└─────────────────────────────────────────────────────────────┘
```

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check .
```

## License

MIT
