Metadata-Version: 2.4
Name: mineru-selfhosted-mcp
Version: 0.1.15
Summary: MCP bridge for a self-hosted MinerU API
Project-URL: Homepage, https://github.com/opendatalab/MinerU
Project-URL: Repository, https://github.com/opendatalab/MinerU
Author: OpenAI Codex for root123
License: MIT
Requires-Python: >=3.10
Requires-Dist: fastmcp>=2.3.0
Requires-Dist: httpx>=0.27.0
Description-Content-Type: text/markdown

# mineru-selfhosted-mcp

`mineru-selfhosted-mcp` exposes a self-hosted MinerU service as an MCP server.

It is designed for setups where:

- MinerU is deployed on a remote GPU server
- Claude Desktop / Cursor / other MCP clients run elsewhere
- The client should only run a lightweight MCP bridge locally

## Environment variables

- `MINERU_BASE_URL`:
  Optional. Base URL of your self-hosted MinerU API. Defaults to the public FRP endpoint `http://42.51.34.112:8191`
- `MINERU_API_KEY`:
  Optional token sent to the MinerU API as `Authorization: Bearer <token>`
- `MINERU_TIMEOUT`:
  Optional request timeout in seconds. Default: `1800`
- `MINERU_TRUST_ENV`:
  Optional. Set to `true` only if you want the bridge to inherit local proxy variables. Default: disabled.
- `MINERU_LOG_DIR`:
  Optional. Directory used by `clean_logs`. Defaults to `~/.mineru-selfhosted-mcp/logs`
- `MINERU_DOWNLOAD_PUBLISH_DIR`:
  Optional. Directory used for published download artifacts. Defaults to `/data/mineru_output/_published`
- `MINERU_DOWNLOAD_BASE_URL`:
  Optional. Public base URL for published artifacts. Defaults to `<MINERU_BASE_URL>/downloads`
- `MINERU_MCP_TRANSPORT`:
  Optional. `stdio` for local clients or `http` for a Linux-hosted remote MCP service. Default: `stdio`
- `MINERU_MCP_HOST`:
  Optional bind host for HTTP MCP mode. Default: `0.0.0.0`
- `MINERU_MCP_PORT`:
  Optional bind port for HTTP MCP mode. Default: `8001`
- `MINERU_UPLOAD_ROOT`:
  Optional. Directory used for staged uploaded source files. Defaults to `/data/mineru_uploads`

If you are running on the Linux host itself, the local proxy listens on `http://127.0.0.1:8192`.
Remote MCP clients should use the public FRP endpoint on `8191`.

## Connection modes

### 1. Windows local MCP via `stdio`

Use this when Claude Code, Cursor, or Claude Desktop runs on Windows and only the MinerU API lives on Linux.

- MCP runs locally on Windows
- MinerU API runs remotely on Linux
- `output_dir`, `artifact_paths`, and `zip_path` are Windows-local
- `*_download_url` is convenience-only

### 2. Linux remote MCP via HTTP

Use this when you want one shared MCP service for many clients and durable public artifact URLs.

- MCP runs on Linux
- MinerU API runs on Linux
- `/downloads` is served from the same Linux filesystem
- Windows clients connect with `type: "http"`
- `zip_download_url` and other `*_download_url` fields become first-class outputs

## Exposed tools

- `mineru_health`: check the remote MinerU API
- `parse_document`: parse a single local file through the remote MinerU API
- `parse_documents`: parse one or more local files through the remote MinerU API
- `parse_directory`: parse all matching files in a directory through the remote MinerU API
- `create_upload_session`: create a staged upload session for a client-side file
- `append_upload_chunk`: append a base64 chunk into an upload session
- `finalize_upload`: validate and finalize an uploaded staged file
- `parse_staged_file`: parse a finalized staged file
- `cleanup_staged_file`: remove a staged upload session
- `get_ocr_languages`: list common OCR language codes supported by MinerU
- `clean_logs`: remove local MCP log files older than a chosen number of days

## Parsing result metadata

Parsing tools also return:

- `elapsed_seconds`
- `file_count`
- `completed_count`
- `failed_count`
- `progress`

`progress` is a stable completion summary for the current call rather than a live streaming progress feed.

## Notes

- To get only `middle_json`, set `middle_json_only=true` in parsing tools.
- `middle_json_only=true` automatically disables markdown in the MCP request wrapper.
- For Windows -> Linux file transfer through MCP, use `create_upload_session`, `append_upload_chunk`, `finalize_upload`, then `parse_staged_file`.

## Example MCP config

```json
{
    "mcpServers": {
      "mineru-selfhosted": {
        "command": "uvx",
        "args": ["-y", "mineru-selfhosted-mcp"],
        "env": {
          "MINERU_API_KEY": "your_token",
          "MINERU_BASE_URL": "http://42.51.34.112:8191",
          "MINERU_DOWNLOAD_BASE_URL": "http://42.51.34.112:8191/downloads"
        }
      }
  }
}
```

## Example remote HTTP MCP server

Run this on the Linux host:

```bash
MINERU_API_KEY=<TOKEN> \
MINERU_BASE_URL=http://127.0.0.1:8192 \
MINERU_MCP_TRANSPORT=http \
MINERU_MCP_HOST=0.0.0.0 \
MINERU_MCP_PORT=8001 \
MINERU_DOWNLOAD_PUBLISH_DIR=/data/mineru_output/_published \
MINERU_DOWNLOAD_BASE_URL=http://42.51.34.112:8191/downloads \
uvx -y mineru-selfhosted-mcp
```

Then expose the MCP endpoint through nginx or another reverse proxy, for example:

- MCP endpoint: `http://42.51.34.112:8191/mcp/`
- downloads base: `http://42.51.34.112:8191/downloads/`

## Example Windows Claude Code config for remote HTTP MCP

```json
{
  "mcpServers": {
    "mineru-selfhosted": {
      "type": "http",
      "url": "http://42.51.34.112:8191/mcp/"
    }
  }
}
```
