Metadata-Version: 2.4
Name: doc-translator-mcp
Version: 1.0.0
Summary: MCP server that translates documents (PPTX, PDF, DOCX, XLSX) preserving layout, with optional Gemini image translation
Author-email: Camus Ma <camushoilingma@gmail.com>
License: MIT
Keywords: ai,docx,gemini,llm,mcp,pdf,pptx,translation,xlsx
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Requires-Dist: google-genai>=1.0.0
Requires-Dist: mcp[cli]>=1.2.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pymupdf>=1.25.0
Requires-Dist: python-docx>=1.1.0
Requires-Dist: python-pptx>=1.0.0
Description-Content-Type: text/markdown

# doc-translator-mcp / 文档翻译 MCP

An MCP server that translates documents (PPTX, PDF, DOCX, XLSX) while preserving the original layout — including text in images. Works with any LLM client that supports MCP.

一个 MCP 服务器，可以翻译文档（PPTX、PDF、DOCX、XLSX），保持原始排版格式不变，并支持翻译图片中嵌入的文字。兼容所有支持 MCP 协议的 LLM 客户端。

## Features / 功能亮点

- **Any language pair** — Chinese ↔ English, French ↔ Japanese, or any combination the host LLM supports
- **No API key required** for text translation — uses whatever LLM is running in your client
- **Optional image translation** — set `GEMINI_API_KEY` to automatically translate text in screenshots, diagrams, and charts via Gemini
- **Format preservation** — fonts, colors, sizes, positioning all maintained; font size auto-scales for longer translations
- **Smart image filtering** — skips icons, logos, and duplicates to minimize API calls
- **Direct file access** — runs locally, reads and writes files directly on your machine

## Supported Formats / 支持格式

| Format | Extension | Text | Images | Rebuild | Summary |
|--------|-----------|------|--------|---------|---------|
| PowerPoint | `.pptx` | ✅ | ✅ | ✅ | ✅ (new slide) |
| PDF | `.pdf` | ✅ | — | ✅ | ✅ (new page) |
| Word | `.docx` | ✅ | — | ✅ | ✅ (prepended) |
| Excel | `.xlsx` | ✅ | — | ✅ | ✅ (new sheet) |

## Quick Start / 快速开始

### Option 1: One-line install (recommended)

Add to your MCP client config (Cursor, CodeBuddy, Claude Desktop, etc.):

```json
{
  "mcpServers": {
    "doc-translator": {
      "command": "uvx",
      "args": ["doc-translator-mcp"],
      "env": {
        "GEMINI_API_KEY": "<your-google-ai-api-key>"
      }
    }
  }
}
```

> Requires [uv](https://docs.astral.sh/uv/getting-started/installation/) installed. This downloads and runs the MCP automatically.

### Option 2: Install from PyPI

```bash
pip install doc-translator-mcp
```

Then add to your MCP client config:

```json
{
  "mcpServers": {
    "doc-translator": {
      "command": "doc-translator-mcp",
      "env": {
        "GEMINI_API_KEY": "<your-google-ai-api-key>"
      }
    }
  }
}
```

### Option 3: Install from source

```bash
git clone https://github.com/camushlm/doc-translator-mcp.git
cd doc-translator-mcp
uv venv --python 3.12
source .venv/bin/activate
uv pip install -e .
```

Then use the same config as Option 2.

> **`GEMINI_API_KEY` is optional.** Without it, text translation works fully; image translation falls back to LLM-powered annotations. Get a free key at https://aistudio.google.com/apikey

## How It Works / 工作原理

```
User: "Translate this PPT to English"
       请帮我把这个PPT翻译成英文

  1. extract_document(file_path)
     → Returns text blocks with IDs and context
       返回带有唯一ID和上下文的文本块

  2. LLM translates each text block
     LLM 翻译每个文本块

  3. extract_images(file_path)             [PPTX only]
     → Filters out icons, logos, duplicates
       过滤掉图标、Logo 和重复图片
     → If GEMINI_API_KEY: translate_images() for automatic translation
       如有 API Key：自动翻译图片中的文字
     → If no key: LLM adds text annotations beside images
       如无 Key：LLM 在图片旁添加文字注释

  4. rebuild_document(translations, image_replacements)
     → Produces translated file preserving original layout
       生成保持原始排版的翻译文档
```

### Translation Modes / 翻译模式

| Mode | Requirements | What it does |
|------|-------------|--------------|
| **Text only** | Any LLM, no API key | `extract → translate → rebuild` |
| **Full** (text + images) | `GEMINI_API_KEY` | Text + Gemini regenerates images with translated text |
| **Annotate** (text + captions) | Multimodal LLM, no API key | Text + LLM describes image text → caption boxes added |

## Tools / 工具列表

### `extract_document`

Extract all translatable text blocks from a document.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `file_path` | string | ✅ | Absolute path to the document |

Returns JSON with `text_blocks` — each has `id`, `text`, and `context`.

### `extract_images`

Extract images from a document for inspection or translation (PPTX only).

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `file_path` | string | ✅ | Path to the document |
| `output_dir` | string | — | Directory for saved images |

Returns JSON with `images`, `mode` (full/annotate), and `guidance`.

### `translate_images`

Translate text within images using Gemini. Requires `GEMINI_API_KEY`.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `file_path` | string | ✅ | Path to the document |
| `target_language` | string | — | Target language (default: English) |
| `source_language` | string | — | Source language (auto-detect if empty) |
| `output_dir` | string | — | Directory for images |
| `gemini_api_key` | string | — | API key (falls back to env var) |
| `gemini_model` | string | — | Model (default: gemini-3.1-flash-image-preview) |

Returns JSON with `image_replacements` mapping for `rebuild_document`.

### `rebuild_document`

Rebuild a document with translated text and optionally replaced/annotated images.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `source_file_path` | string | ✅ | Path to the original document |
| `translations` | string | ✅ | JSON `{block_id: translated_text}` |
| `summary` | string | — | Summary text for the first page |
| `output_file_path` | string | — | Custom output path |
| `image_replacements` | string | — | JSON `{image_id: "/path/to/translated.png"}` |
| `image_annotations` | string | — | JSON `{image_id: "caption text"}` |

Returns JSON with path to the translated file.

### `list_supported_formats`

Returns supported formats, current image translation mode, and recommended workflow.

## Example / 使用示例

```
You: 请帮我把这个PPT翻译成英文 /path/to/报告.pptx

AI: [calls extract_document] → 281 text blocks extracted
AI: [translates all blocks Chinese → English]
AI: [calls extract_images] → 16 unique images found
AI: [calls translate_images] → 11 images with text translated via Gemini
AI: [calls rebuild_document with translations + image_replacements]
AI: Done! Saved to /path/to/报告_translated.pptx
    281 text blocks and 11 images translated.
```

## Architecture / 技术架构

```
┌─────────────────────────────────┐
│  LLM Client (Cursor / CodeBuddy / Claude)  │
│  └─ uses its own LLM for text translation  │
└──────────────┬──────────────────┘
               │ MCP protocol (STDIO)
               ▼
┌─────────────────────────────────┐
│  doc-translator-mcp             │
│  ├─ server.py    (MCP tools)    │
│  ├─ pptx_handler (text+images)  │
│  ├─ pdf_handler  (text overlay) │
│  ├─ docx_handler (text replace) │
│  └─ xlsx_handler (cell replace) │
└──────────────┬──────────────────┘
               │ optional
               ▼
┌─────────────────────────────────┐
│  Gemini API (image translation) │
│  gemini-3.1-flash-image-preview │
└─────────────────────────────────┘
```

## Environment Variables / 环境变量

| Variable | Required | Description |
|----------|----------|-------------|
| `GEMINI_API_KEY` | Optional | Google AI API key for image translation. Get free at https://aistudio.google.com/apikey |

## License

MIT
