Metadata-Version: 2.4
Name: mcp-docgen
Version: 0.2.0
Summary: Markdown-driven MCP server to create, read and edit Word (.docx), Excel (.xlsx), PowerPoint (.pptx) and PDF documents — by the Touka project.
Keywords: mcp,model-context-protocol,docx,xlsx,pptx,word,excel,powerpoint,pdf,document-generation,document-reading,document-editing,markdown
Author: Otoha (Touka Project)
Author-email: Otoha (Touka Project) <whitekinglight@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business :: Office Suites
Classifier: Topic :: Text Processing :: Markup
Requires-Dist: python-docx>=1.1
Requires-Dist: openpyxl>=3.1
Requires-Dist: xlsxwriter>=3.2
Requires-Dist: python-pptx>=1.0
Requires-Dist: markdown-it-py>=3.0
Requires-Dist: mcp>=1.26
Requires-Dist: reportlab>=4.5.1
Requires-Dist: pypdf>=6.13.1
Maintainer: Touka Project
Maintainer-email: Touka Project <whitekinglight@gmail.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown

# mcp-docgen

A Markdown-driven [Model Context Protocol](https://modelcontextprotocol.io) (MCP) server
to **create, read and edit** **Word (`.docx`)**, **Excel (`.xlsx`)**,
**PowerPoint (`.pptx`)** and **PDF** documents.

Built entirely on mature, permissively-licensed Python libraries
([`python-docx`](https://github.com/python-openxml/python-docx),
[`python-pptx`](https://github.com/scanny/python-pptx),
[`openpyxl`](https://foss.heptapod.net/openpyxl/openpyxl),
[`XlsxWriter`](https://github.com/jmcnamara/XlsxWriter),
[`reportlab`](https://pypi.org/project/reportlab/),
[`pypdf`](https://github.com/py-pdf/pypdf),
[`markdown-it-py`](https://github.com/executablebooks/markdown-it-py)) — no proprietary
dependencies. **MIT licensed.**

> Part of the **Touka** project: giving AI agents the ability to produce, read and edit
> real Office documents using only open-source building blocks.

## Why

LLMs are great at producing Markdown. `mcp-docgen` converts Markdown to polished Office
documents — and reads them back to Markdown — so an MCP-capable assistant (Claude Desktop,
Touka, …) can run a full **read → edit → write** loop on real `.docx` / `.xlsx` / `.pptx`
/ `.pdf` files.

## Install & run

```bash
uvx mcp-docgen          # once published to PyPI
# or, from a local checkout:
uv sync && uv run mcp-docgen
```

The server speaks MCP over **stdio**.

## MCP client configuration

```jsonc
{
  "mcpServers": {
    "docgen": {
      "command": "uvx",
      "args": ["mcp-docgen"],
      "env": { "MCP_DOCGEN_OUTPUT_DIR": "/absolute/path/to/workdir" }
    }
  }
}
```

From a local checkout, swap the command for:

```jsonc
{ "command": "uv", "args": ["run", "--directory", "/path/to/mcp-docgen", "mcp-docgen"] }
```

## Tools

### Create (Markdown / structured data → file)

| Tool | Input → Output |
| --- | --- |
| `create_docx(markdown, output_path, title?)` | Markdown → Word |
| `create_pptx(markdown, output_path, title?)` | Markdown → PowerPoint |
| `create_pdf(markdown, output_path, title?)` | Markdown → PDF |
| `create_xlsx(sheets, output_path)` | structured rows → Excel |

Markdown features: headings, **bold** / *italic* / `inline code`, bullet & numbered lists
(nested), tables, block quotes, fenced code blocks, horizontal rules.

**PowerPoint slide convention:** `# Heading` starts a new slide (its title); content below
becomes bullet points; `---` forces a slide break; `title` adds a leading title slide.

**Excel `sheets`:** `[{ "name": str, "rows": [[cell, …], …], "header"?: bool }]`. Cells may
be strings / numbers / booleans / `null`; the first row is a bold, frozen header unless
`"header": false`.

### Read (file → Markdown / structured data)

| Tool | Returns |
| --- | --- |
| `read_docx(input_path)` | `{ "markdown": … }` |
| `read_pptx(input_path)` | `{ "markdown": … }` |
| `read_xlsx(input_path)` | `{ "sheets": [{ "name", "rows" }] }` (round-trips with `create_xlsx`) |
| `read_pdf(input_path)` | `{ "num_pages", "pages": […], "text" }` |

Reading docx/pptx to Markdown enables editing **without** in-place tools: read → edit the
Markdown → `create_*` to regenerate.

### Edit (in-place, preserving the rest)

| Tool | Effect |
| --- | --- |
| `edit_xlsx(input_path, output_path, edits)` | set cells / append rows / add sheets, keeping other sheets, formulas & formatting |
| `append_docx(input_path, output_path, markdown)` | append Markdown content to the end |
| `append_pptx(input_path, output_path, markdown)` | append Markdown-derived slides to the end |

`edits` = `{ "set_cells": [{"sheet","cell","value"}], "append_rows": [{"sheet","rows"}],
"add_sheet": [{"name","rows"}] }`.

### PDF page operations

| Tool | Effect |
| --- | --- |
| `pdf_merge(input_paths, output_path)` | concatenate PDFs in order |
| `pdf_split(input_path, output_dir?)` | one file per page |
| `pdf_extract(input_path, pages, output_path)` | extract a page subset (e.g. `"1-3,5"`) |

> **Note on PDF "editing":** clean open-source PDF editing means **page operations**
> (merge / split / extract), **not** reflowing or replacing body text — PDFs are not
> designed for in-place text editing. To revise PDF *content*, regenerate with `create_pdf`.

Create/edit tools return `{"path": <absolute path>}`; `pdf_split` returns `{"paths": […]}`.

## Directories & safety

- **Output** files are written inside `MCP_DOCGEN_OUTPUT_DIR` (default `./out`).
- **Input** files (read / edit) are read from `MCP_DOCGEN_INPUT_DIR` (default = the output
  dir), so a read → edit → write loop shares one working directory.
- Every path is interpreted **relative to its base**; any path escaping it (via `..` or an
  absolute path) is rejected, missing inputs and wrong suffixes raise errors.
- The server makes **no network calls** and spawns **no subprocesses**.

## Examples

```bash
uv run python examples/generate_samples.py   # create report.docx / review.pptx / sales.xlsx
uv run python examples/roundtrip_demo.py      # create → read → edit → PDF round-trip
```

## Development

```bash
uv sync
uv run pytest
uv run ruff check .
```

## License

MIT © 2026 Touka Project — see [LICENSE](LICENSE).

Powered by python-docx, python-pptx, openpyxl, XlsxWriter, reportlab and pypdf; Markdown
parsing by markdown-it-py. All MIT/BSD licensed.
