Metadata-Version: 2.4
Name: iflow-mcp_pietz-mcp-web-tools
Version: 0.9.1
Summary: A powerful MCP server to equip LLMs with web access, search, and content extraction capabilities
Project-URL: Homepage, https://github.com/pietz/mcp-web-tools
Project-URL: Issues, https://github.com/pietz/mcp-web-tools/issues
Author-email: Paul-Louis Pröve <mail@plpp.de>
License: MIT
Keywords: extraction,llm,mcp,pdf,search,web
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup
Requires-Python: >=3.11
Requires-Dist: duckduckgo-search>=8.1.1
Requires-Dist: googlesearch-python>=1.3.0
Requires-Dist: httpx>=0.28.1
Requires-Dist: mcp>=1.13.1
Requires-Dist: perplexityai>=0.11.0
Requires-Dist: pillow>=11.2.1
Requires-Dist: psutil>=7.0.0
Requires-Dist: pymupdf4llm>=0.0.27
Requires-Dist: trafilatura>=2.0.0
Requires-Dist: zendriver>=0.14.0
Description-Content-Type: text/markdown

# MCP Web Tools

This package provides a powerful MCP server to equip LLMs with web access, going beyond naive methods of searching, fetching and extracting content.

## Introduction

I created this package out of the frustration that most MCP servers enabling web access to LLMs, didn't perform as well as I hoped. Some of these shortcomings I wanted fix, include:

- [x] Good search results without requiring an API key
- [x] Sophisticated fetching for more complex JavaScript sites
- [x] Extracting content in nicely formatted Markdown
- [x] Support for extracting content from PDFs
- [x] Support for loading and displaying images
- [x] Capture rendered webpage screenshots for visual context
- [x] Usage options for advanced cases like loading raw HTML

## Installation

### Claude Desktop

```json
```

### Claude Code

```bash
claude mcp add web-tools uvx mcp-web-tools
```

Or to also set the Brave Search API key:

```bash
claude mcp add web-tools uvx mcp-web-tools -e BRAVE_SEARCH_API_KEY=<key>
```

Provide a [Perplexity Search API](https://www.perplexity.ai/api-platform) key to prioritize their fresh, citation-rich index:

```bash
claude mcp add web-tools uvx mcp-web-tools -e PERPLEXITY_API_KEY=<key>
```

You can mix both environment variables to fall back from Perplexity to Brave seamlessly.

## Internals

The package is written in Python using powerful libraries and services under the hood to improve results.

### Searching

We use the [Perplexity Search API](https://www.perplexity.ai/hub/blog/introducing-the-perplexity-search-api) when a `PERPLEXITY_API_KEY` is configured. It delivers ranked snippets with citations from Perplexity's continuously refreshed index. If no Perplexity key is available, we fall back to the [Brave Search API](https://brave.com/search/api) (via `BRAVE_SEARCH_API_KEY`), then a lightweight Google workaround, and finally DuckDuckGo. While we recommend adding at least one API key, the chained fallbacks continue working for most workloads.

### Fetching

The fetching of web content is based on [Zendriver](https://github.com/stephanlensky/zendriver), a fork of [nodriver](https://github.com/ultrafunkamsterdam/nodriver/) for next level webscraping and performance. It should stay undetected for most anti-bot solutions and fetch content even from complex JS-based sites.

### Extracting

For web extraction, we use [Trafilatura](https://trafilatura.readthedocs.io/en/latest/index.html) which consistently outperforms other alternatives for extracting content from HTML pages. For PDFs, we use [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) which similarly extracts content in an easy-to-read format for LLMs, with advanced layout support.

### Screenshots

Rendered page previews are powered by [Zendriver](https://github.com/stephanlensky/zendriver). The `view_website` tool navigates to a URL in a headless Chromium session and returns the resulting page as a PNG screenshot. By default only the current viewport is captured, but callers can request a full-page image by setting the `full_page` argument to `true`.

## Contributing

While it's impossible to support all pages and layouts, we thrive to make this package better over time. For unsupported sites, problems, or feature requests open an issue.

## CI, Releases, and Publishing

This repo includes a GitHub Actions workflow that:

- Runs tests via `uv` on PRs and pushes to `main`.
- On push to `main`, if `project.version` in `pyproject.toml` changed, it:
  - Builds distributions with `uv build`.
  - Creates a GitHub Release tagged `v<version>` with autogenerated notes.
  - Publishes the package to PyPI using `uv publish`.
- Merge a PR that bumps `project.version` in `pyproject.toml` to trigger a release.

Rollback:

- If a release was created erroneously, delete the GitHub Release and tag `v<version>`.
- Yank the version on PyPI if needed.
