Metadata-Version: 2.4
Name: huoshui-fetch
Version: 0.1.1
Summary: A dedicated web content fetching and conversion service based on the MCP philosophy.
Author-email: huoshui ai <service@huoshuiai.com>
License: MIT
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: fastmcp>=0.1.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: lxml>=5.0.0
Requires-Dist: markdownify>=0.11.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: readability-lxml>=0.8.1
Description-Content-Type: text/markdown

# huoshui-fetch

A dedicated web content fetching and conversion MCP (Model Context Protocol) server that provides tools for fetching, converting, and extracting data from web pages.

## Features

### Fetching Tools
- **fetch_url**: Fetch content from URLs with customizable timeout, redirect handling, and user-agent
- **fetch_with_headers**: Fetch URLs with custom headers for authenticated requests

### Conversion Tools  
- **html_to_markdown_tool**: Convert HTML to clean Markdown format
- **html_to_text_tool**: Extract plain text from HTML
- **clean_html_tool**: Remove scripts/styles and sanitize HTML
- **json_to_markdown_tool**: Convert JSON data to readable Markdown

### Extraction Tools
- **extract_article_tool**: Extract main article content using readability
- **extract_links_tool**: Extract all links with filtering options
- **extract_metadata_tool**: Extract page metadata (title, description, OG tags)
- **extract_images_tool**: Extract images with size filtering
- **extract_structured_data_tool**: Extract JSON-LD and microdata

## Installation

```bash
# Using uv (recommended)
uv sync

# Or install from GitHub
pip install git+https://github.com/yourusername/huoshui-fetch.git
```

## Usage

### Run with uvx (recommended for one-time use)

```bash
# From the repository
uvx --from . huoshui-fetch

# From GitHub (once published)
uvx --from git+https://github.com/yourusername/huoshui-fetch.git huoshui-fetch
```

### Run directly

```bash
# Using uv
uv run python -m huoshui_fetch

# Or if installed
python -m huoshui_fetch
```

The server communicates via standard input/output, making it perfect for integration with Claude Desktop and other MCP-compatible clients.

## Configuration for Claude Desktop

Add to your Claude Desktop configuration:

```json
{
  "mcpServers": {
    "huoshui-fetch": {
      "command": "uvx",
      "args": ["--no-cache", "--from", ".", "huoshui-fetch"],
      "cwd": "/path/to/huoshui-fetch"
    }
  }
}
```

Or if installed from GitHub:

```json
{
  "mcpServers": {
    "huoshui-fetch": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/yourusername/huoshui-fetch.git", "huoshui-fetch"]
    }
  }
}
```

## Example Usage

Once configured, you can use the tools in Claude Desktop:

```
// Fetch a webpage
fetch_url("https://example.com")

// Convert HTML to Markdown
html_to_markdown_tool("<h1>Hello</h1><p>World</p>")

// Extract article content
extract_article_tool(html_content, "https://example.com/article")
```

## Requirements

- Python 3.11+
- Dependencies listed in pyproject.toml

## DXT Extension

This project supports DXT (Desktop Extensions) format for easy distribution and installation. 

To build the DXT extension:

```bash
python build_dxt.py
```

This will create a `huoshui-fetch-{version}.dxt` file that can be installed in compatible AI desktop applications.

## License

MIT