Metadata-Version: 2.4
Name: huoshui-fetch
Version: 0.1.2
Summary: A dedicated web content fetching and conversion service based on the MCP philosophy.
Author-email: huoshui ai <service@huoshuiai.com>
License: MIT
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: fastmcp>=0.1.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: lxml>=5.0.0
Requires-Dist: markdownify>=0.11.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: readability-lxml>=0.8.1
Description-Content-Type: text/markdown

# huoshui-fetch

A dedicated web content fetching and conversion MCP (Model Context Protocol) server that provides tools for fetching, converting, and extracting data from web pages.

## Features

### Fetching Tools

- **fetch_url**: Fetch content from URLs with customizable timeout, redirect handling, and user-agent
- **fetch_with_headers**: Fetch URLs with custom headers for authenticated requests

### Conversion Tools

- **html_to_markdown_tool**: Convert HTML to clean Markdown format
- **html_to_text_tool**: Extract plain text from HTML
- **clean_html_tool**: Remove scripts/styles and sanitize HTML
- **json_to_markdown_tool**: Convert JSON data to readable Markdown

### Extraction Tools

- **extract_article_tool**: Extract main article content using readability
- **extract_links_tool**: Extract all links with filtering options
- **extract_metadata_tool**: Extract page metadata (title, description, OG tags)
- **extract_images_tool**: Extract images with size filtering
- **extract_structured_data_tool**: Extract JSON-LD and microdata

## Installation

From MCP Registry (Recommended)

This server is available in the Model Context Protocol Registry. Install it using your MCP client.

mcp-name: io.github.huoshuiai42/huoshui-fetch

```bash
# Using uv (recommended)
uv sync

# Or install from GitHub
pip install git+https://github.com/yourusername/huoshui-fetch.git
```

## Usage

### Run with uvx (recommended for one-time use)

```bash
# From the repository
uvx --from . huoshui-fetch

# From GitHub (once published)
uvx --from git+https://github.com/yourusername/huoshui-fetch.git huoshui-fetch
```

### Run directly

```bash
# Using uv
uv run python -m huoshui_fetch

# Or if installed
python -m huoshui_fetch
```

The server communicates via standard input/output, making it perfect for integration with Claude Desktop and other MCP-compatible clients.

## Configuration for Claude Desktop

Add to your Claude Desktop configuration:

```json
{
  "mcpServers": {
    "huoshui-fetch": {
      "command": "uvx",
      "args": ["--no-cache", "--from", ".", "huoshui-fetch"],
      "cwd": "/path/to/huoshui-fetch"
    }
  }
}
```

Or if installed from GitHub:

```json
{
  "mcpServers": {
    "huoshui-fetch": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/yourusername/huoshui-fetch.git",
        "huoshui-fetch"
      ]
    }
  }
}
```

## Example Usage

Once configured, you can use the tools in Claude Desktop:

```
// Fetch a webpage
fetch_url("https://example.com")

// Convert HTML to Markdown
html_to_markdown_tool("<h1>Hello</h1><p>World</p>")

// Extract article content
extract_article_tool(html_content, "https://example.com/article")
```

## Requirements

- Python 3.11+
- Dependencies listed in pyproject.toml

## Development & Publishing

This project includes comprehensive automation for building and publishing to PyPI.

### Automated Publishing Workflow

```bash
# Complete automated workflow (TestPyPI + PyPI)
uv run python scripts/publish.py --include-pypi

# TestPyPI only (recommended for testing)
uv run python scripts/publish.py

# Bump version and publish
uv run python scripts/publish.py --version-bump patch --include-pypi
```

### Individual Commands

```bash
# Version management
uv run python scripts/version_manager.py --check
uv run python scripts/version_manager.py --bump patch

# Build package
uv run python scripts/build.py

# Run comprehensive tests
uv run python scripts/test.py

# Upload to PyPI
uv run python scripts/upload.py
```

### Features

- ✅ **Version Management**: Automatic synchronization across all files
- ✅ **Quality Checks**: Ruff linting and MyPy type checking
- ✅ **Build Automation**: Clean builds with validation
- ✅ **Testing Suite**: Comprehensive package and functionality tests
- ✅ **Publishing Workflow**: TestPyPI → PyPI with validation
- ✅ **Error Recovery**: Built-in error handling and recovery options

See [PUBLISHING.md](PUBLISHING.md) for detailed documentation.

## DXT Extension

This project supports DXT (Desktop Extensions) format for easy distribution and installation.

To build the DXT extension:

```bash
python build_dxt.py
```

This will create a `huoshui-fetch-{version}.dxt` file that can be installed in compatible AI desktop applications.

## License

MIT
