Metadata-Version: 2.4
Name: searchlight-mcp
Version: 4.1.0
Summary: Free, open-source web search MCP server for AI coding tools
Author: Searchlight Contributors
License-Expression: MIT
Keywords: ai,coding-tools,free,mcp,web-search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Requires-Dist: aiosqlite>=0.19
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: certifi>=2024.0.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: lxml>=4.9
Requires-Dist: markdownify>=0.14
Requires-Dist: mcp[cli]>=1.0
Requires-Dist: readability-lxml>=0.8
Requires-Dist: trafilatura>=1.6
Provides-Extra: browser
Requires-Dist: playwright>=1.40; extra == 'browser'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest-httpx>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Description-Content-Type: text/markdown

# Searchlight

Free, open-source web search MCP server for AI coding tools.

Works with **Claude Code**, **Cursor**, **Windsurf**, **VS Code Copilot**, and any MCP-compatible AI tool. Zero API keys required — install, add one line to your MCP config, and start searching.

## Features

- **Zero cost** — Free search via native HTTP scraping (Bing, Baidu, Yandex, Brave, DuckDuckGo)
- **Zero config** — Works out of the box with `auto` backend and language-aware routing
- **7 search engines** with automatic failover and reachability probing
- **4 MCP tools** — `web_search`, `web_read`, `web_search_and_read`, `search_config`
- **Quality Site Library** — Auto-enhances queries with authoritative sources (Anthropic, OpenAI, MCP docs, LangChain, etc.)
- **Smart content extraction** — trafilatura → readability → BeautifulSoup fallback chain with quality scoring
- **JS page rendering** — Automatic Jina AI proxy fallback for JavaScript-rendered pages
- **Smart caching** — Async SQLite with dynamic TTL (time-sensitive queries cache shorter)
- **Auto-learning** — Automatically discovers and adds high-quality websites from your reading patterns
- **Security** — Automatic API key/secret detection and redaction in queries

## Installation

### Option 1: PyPI (Recommended)

```bash
pip install searchlight-mcp
```

Or with uv:

```bash
uv pip install searchlight-mcp
```

### Option 2: Install from GitHub

```bash
pip install git+https://github.com/McKenzieIT/smart-web-search.git
```

## Quick Start — One-Click MCP Setup

### Claude Code

Add to `~/.claude.json` or project `.mcp.json`:

```json
{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}
```

Or use the CLI one-liner:

```bash
claude mcp add searchlight -- python -m searchlight
```

### Cursor

Add to `.cursor/mcp.json` in your project root:

```json
{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}
```

Or global: `~/.cursor/mcp.json`

### VS Code Copilot

Add to `.vscode/mcp.json`:

```json
{
  "servers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}
```

### Windsurf

Add to `.windsurf/mcp.json`:

```json
{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}
```

### Claude Desktop

Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):

```json
{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}
```

### Generic MCP Client

Any MCP-compatible tool can use this configuration:

```json
{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}
```

Restart your AI tool after adding the config. Searchlight's 4 tools are immediately available.

## Agent Prompts — Teach Your AI to Search Well

After installing searchlight, paste the appropriate prompt into your agent's system prompt or custom instructions so it knows **when and how to search**.

### Claude Code

Add to your `CLAUDE.md` or project `.claude/instructions.md`:

```markdown
## Web Search Guidelines
- Use `web_search` to find documentation, error solutions, current events, and comparisons.
- Use `web_read` to extract detailed content from a specific URL.
- Use `web_search_and_read` for deep research that requires reading multiple pages.
- For quick lookups, `web_search` alone is sufficient — no need to read every result.
- Use `time_range="month"` or `"week"` for current events or recent changes.
- Use `mode="preview"` to check if a page is relevant before reading the full content.
- The Quality Site Library automatically prioritizes authoritative sources for AI/developer topics.
```

### Cursor

Add to `.cursorrules` or Cursor's custom instructions:

```markdown
## Web Search
When you need current information, documentation, or solutions not in your training data:
- Use `web_search(query)` to find relevant results quickly.
- Use `web_read(url)` to read a specific page's content.
- Use `web_search_and_read(query)` for comprehensive research.
- Prefer `web_search` for quick answers; use `web_search_and_read` for in-depth analysis.
- Use `time_range="month"` for recent information.
```

### VS Code Copilot

Add to `.github/copilot-instructions.md`:

```markdown
## Web Search
- `web_search(query)`: Find information on the web. Returns titles, URLs, snippets.
- `web_read(url)`: Read a web page's content as Markdown.
- `web_search_and_read(query)`: Search and read top results in one call.
- Use web_search as the first choice. Use web_search_and_read for research tasks.
- Use time_range parameter for current events.
```

### Windsurf

Add to `.windsurfrules`:

```markdown
## Web Search
- Use web_search to find docs, error solutions, and current info.
- Use web_read to extract content from specific URLs.
- Use web_search_and_read for deep research.
- The searchlight MCP server auto-boosts results from authoritative AI/developer sources.
```

## MCP Tools

### web_search

Search the web using multiple engines. Returns Markdown-formatted results.

```python
web_search(query="Python asyncio tutorial", max_results=10, time_range="month")
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query | string | required | Search terms (max 500 chars) |
| max_results | int | 10 | Number of results (1-20) |
| language | string | null | Language code (zh, en, ja, etc.) |
| time_range | string | null | "day", "week", "month", "year" |
| backend | string | null | Override backend for this search |

### web_read

Read and extract clean Markdown content from a web page.

```python
web_read(url="https://docs.python.org/3/library/asyncio.html", max_length=10000, mode="full")
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| url | string | required | URL to read |
| max_length | int | 10000 | Maximum characters to return |
| mode | string | "full" | "preview" (headings only) or "full" |

### web_search_and_read

Search + read top results in one call. Best for deep research.

```python
web_search_and_read(query="FastAPI vs Flask comparison", max_read=2)
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query | string | required | Search query |
| max_results | int | 5 | Max search results |
| max_read | int | 2 | Pages to read (auto-extends on failure) |
| max_length | int | 10000 | Max chars per page |

### search_config

View and manage searchlight configuration.

```python
search_config(action="status")                          # Show config, cache, QSL stats
search_config(action="health_check")                    # Test engine connectivity
search_config(action="clear_cache")                     # Clear all cached results
search_config(action="set_backend", backend="bing")     # Switch default backend
```

## Search Backends

All backends use direct HTTP scraping — **no API keys needed**.

| Backend | Description | Language |
|---------|-------------|----------|
| auto | Best available (default) | Auto-detect |
| bing | Bing International | English |
| bing_cn | Bing China | Chinese |
| baidu | Baidu Search | Chinese |
| yandex | Yandex Search | English |
| brave | Brave Search | English |
| duckduckgo | DuckDuckGo | English |

The `auto` backend automatically detects Chinese characters and routes to Baidu/Bing CN for Chinese queries, and uses Brave/DuckDuckGo/Bing for English queries. Engines are probed for reachability and failed engines are skipped.

## Quality Site Library

Searchlight includes a built-in Quality Site Library that enhances search results for AI/developer topics:

- **Query Enhancement** — Automatically adds authoritative keywords when searching for LLM, MCP, agent, Python topics
- **Result Boosting** — Moves results from quality domains (official docs, research papers) higher in rankings
- **Auto-Learning** — Tracks websites you read and automatically adds high-quality ones to the library

Built-in categories with curated sources:

| Category | Quality Sources |
|----------|----------------|
| LLM | OpenAI Platform, Anthropic Docs, Google AI, Hugging Face |
| MCP | modelcontextprotocol.io, Anthropic MCP Docs |
| Agents | LangChain, CrewAI, LlamaIndex |
| Anthropic | anthropic.com, docs.anthropic.com |
| Google AI | ai.google.dev, Google Cloud |
| Python | docs.python.org, PyPI, uv/Real Python |

The library is stored at `~/.searchlight/sites.json` and can be manually edited.

## Content Extraction Pipeline

```
URL → HTTP Fetch (SSL progressive degradation)
    → trafilatura (Markdown mode)
    → readability + markdownify (fallback)
    → BeautifulSoup cleanup (fallback)
    → Quality Report (5-dimension assessment)
    → Section-aware truncation
    → Cached in SQLite
```

Quality scores measure: text density, structure quality, noise-free, completeness, and HTML cleanliness. JavaScript-rendered pages automatically fall back to the Jina AI proxy for rendering.

## Configuration

Set via MCP config `env` field or shell environment:

```bash
SEARCHLIGHT_BACKEND=auto           # Default backend
SEARCHLIGHT_CACHE_TTL=24           # Cache TTL in hours
SEARCHLIGHT_CACHE_MAX_SIZE=100     # Max cache size in MB
SEARCHLIGHT_MAX_CONTENT=10000      # Max content length in chars
SEARCHLIGHT_TIMEOUT=15             # HTTP timeout in seconds
SEARCHLIGHT_VERBOSE=true           # Enable debug logging
```

Example with custom backend:

```json
{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"],
      "env": {
        "SEARCHLIGHT_BACKEND": "bing"
      }
    }
  }
}
```

## Architecture

```
searchlight/
├── server.py              # MCP server + 4 tools
├── sites_library.py       # Quality Site Library (QSL)
├── config.py              # Environment-based config
├── search/
│   ├── base.py            # SearchBackend ABC + SearchResult
│   └── native.py          # Native HTTP search (7 engines)
├── reader/
│   ├── fetcher.py         # HTTP fetching + SSL fallback + Jina proxy
│   └── extractor.py       # Content extraction + QualityReport
├── processing/
│   ├── filter.py          # Dedup + spam filtering
│   └── truncator.py       # Section-aware Markdown truncation
├── cache/
│   └── sqlite.py          # Async SQLite with smart TTL
├── security/
│   └── sanitizer.py       # Secret detection in queries
└── utils/
    ├── logger.py          # Logging setup
    └── health.py          # Backend health checks
```

## Publishing

```bash
pip install build twine
python -m build
twine upload dist/*
```

## Development

```bash
pip install -e ".[dev]"
pytest tests/ -v
```

Run with verbose logging:

```bash
SEARCHLIGHT_VERBOSE=true python -m searchlight
```

## License

MIT
