Metadata-Version: 2.4
Name: doc-translator-mcp
Version: 1.3.0
Summary: MCP server that translates documents (PPTX, PDF, DOCX, XLSX) preserving layout, with optional Gemini image translation
Author-email: Camus Ma <camushoilingma@gmail.com>
License: MIT
Keywords: ai,docx,gemini,llm,mcp,pdf,pptx,translation,xlsx
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Requires-Dist: google-genai>=1.0.0
Requires-Dist: mcp[cli]>=1.2.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pymupdf>=1.25.0
Requires-Dist: python-docx>=1.1.0
Requires-Dist: python-pptx>=1.0.0
Description-Content-Type: text/markdown

# doc-translator-mcp / 文档翻译 MCP

[![PyPI](https://img.shields.io/pypi/v/doc-translator-mcp)](https://pypi.org/project/doc-translator-mcp/)
[![Python](https://img.shields.io/pypi/pyversions/doc-translator-mcp)](https://pypi.org/project/doc-translator-mcp/)
[![License](https://img.shields.io/pypi/l/doc-translator-mcp)](https://pypi.org/project/doc-translator-mcp/)

An MCP server that translates documents (PPTX, PDF, DOCX, XLSX) while preserving the original layout — including text embedded in images. Works with any LLM client that supports MCP.

一个 MCP 服务器，可翻译文档（PPTX、PDF、DOCX、XLSX），保持原始排版格式不变，并支持翻译图片中嵌入的文字。兼容所有支持 MCP 协议的 LLM 客户端。

## Features / 功能亮点

- **Any language pair** — Chinese ↔ English, French ↔ Japanese, or any combination the host LLM supports
- **No API key required** for text translation — uses whatever LLM is running in your client
- **Optional image translation** — set `GEMINI_API_KEY` to automatically translate text in screenshots, diagrams, and charts via Gemini
- **Format preservation** — fonts, colors, sizes, positioning all maintained; font size auto-scales for longer translations
- **Smart image filtering** — skips icons, logos, and duplicates to minimize API calls
- **Best with Claude** — tested and optimized for Claude; other LLMs may work but results vary

## Supported Formats / 支持格式

| Format | Extension | Text | Images | Rebuild | Summary |
|--------|-----------|------|--------|---------|---------|
| PowerPoint | `.pptx` | ✅ | ✅ | ✅ | ✅ (new slide) |
| PDF | `.pdf` | ✅ | — | ✅ | ✅ (new page) |
| Word | `.docx` | ✅ | — | ✅ | ✅ (prepended) |
| Excel | `.xlsx` | ✅ | — | ✅ | ✅ (new sheet) |

## Install / 安装

Add to your MCP client config (Cursor, CodeBuddy, Claude Desktop, etc.):

```json
{
  "mcpServers": {
    "doc-translator": {
      "command": "uvx",
      "args": ["doc-translator-mcp"],
      "env": {
        "GEMINI_API_KEY": "<optional-google-ai-api-key>"
      }
    }
  }
}
```

That's it. [uv](https://docs.astral.sh/uv/getting-started/installation/) downloads and runs the server automatically.

> **`GEMINI_API_KEY` is optional.** Without it, text translation works fully; only image translation (PPTX) requires it. Get a free key at https://aistudio.google.com/apikey

### Alternative: pip install

```bash
pip install doc-translator-mcp
```

Then use `"command": "doc-translator-mcp"` instead of `uvx`.

## How It Works / 工作原理

```
User: "Translate this PPT to English"
       请帮我把这个PPT翻译成英文

  1. extract_document(file_path)
     → Returns text blocks with IDs and context
       返回带有唯一ID和上下文的文本块

  2. LLM translates each text block
     LLM 翻译每个文本块

  3. extract_images(file_path)             [PPTX only]
     → Filters out icons, logos, duplicates
       过滤掉图标、Logo 和重复图片
     → If GEMINI_API_KEY: translate_images() for auto translation
       如有 API Key：自动翻译图片中的文字
     → If no key: LLM adds text annotations beside images
       如无 Key：LLM 在图片旁添加文字注释

  4. rebuild_document(translations, image_replacements)
     → Produces translated file preserving original layout
       生成保持原始排版的翻译文档
```

### Translation Modes / 翻译模式

| Mode | Requirements | What it does |
|------|-------------|--------------|
| **Text only** | Any LLM, no API key | `extract → translate → rebuild` |
| **Full** (text + images) | `GEMINI_API_KEY` | Text + Gemini regenerates images with translated text |
| **Annotate** (text + captions) | Multimodal LLM, no API key | Text + LLM describes image text → caption boxes added |

## Tools / 工具列表

| Tool | Description |
|------|-------------|
| `extract_document` | Extract translatable text blocks from a document |
| `extract_images` | Extract images from PPTX for inspection or translation |
| `translate_images` | Translate text in images via Gemini (requires API key) |
| `rebuild_document` | Rebuild document with translated text and images |
| `list_supported_formats` | List supported formats and current workflow |

## Example / 使用示例

```
You: 请帮我把这个PPT翻译成英文 /path/to/报告.pptx

AI: [calls extract_document] → 281 text blocks extracted
AI: [translates all blocks Chinese → English]
AI: [calls extract_images] → 16 unique images found
AI: [calls translate_images] → 11 images translated via Gemini
AI: [calls rebuild_document]
AI: Done! Saved to /path/to/报告_translated.pptx
```

## Architecture / 技术架构

```
┌──────────────────────────────────────────┐
│  LLM Client (Cursor / CodeBuddy / Claude)│
│  └─ uses its own LLM for text translation│
└──────────────────┬───────────────────────┘
                   │ MCP (STDIO)
                   ▼
┌──────────────────────────────────────────┐
│  doc-translator-mcp                      │
│  ├─ pptx_handler  (text + images)        │
│  ├─ pdf_handler   (text overlay)         │
│  ├─ docx_handler  (text replace)         │
│  └─ xlsx_handler  (cell replace)         │
└──────────────────┬───────────────────────┘
                   │ optional
                   ▼
┌──────────────────────────────────────────┐
│  Gemini API (image translation)          │
└──────────────────────────────────────────┘
```

## Environment Variables / 环境变量

| Variable | Required | Description |
|----------|----------|-------------|
| `GEMINI_API_KEY` | Optional | Google AI API key for image translation. Free at https://aistudio.google.com/apikey |

## License

MIT
