Metadata-Version: 2.4
Name: docmcp
Version: 1.0.0
Summary: Document processing MCP server (PDF read, merge, split, compress, report generation)
Project-URL: Homepage, https://github.com/Medalcode/DocMCP
Project-URL: Repository, https://github.com/Medalcode/DocMCP
Project-URL: Bug Tracker, https://github.com/Medalcode/DocMCP/issues
Author-email: Jonatthan Medalla <152304407+Medalcode@users.noreply.github.com>
License: MIT
License-File: LICENSE
Keywords: document-processing,mcp,model-context-protocol,pdf,pymupdf,reportlab
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: mcp
Requires-Dist: pymupdf
Requires-Dist: pypdf
Requires-Dist: reportlab
Description-Content-Type: text/markdown

# DocMCP — Document Processing MCP Server

[![CI](https://github.com/Medalcode/DocMCP/actions/workflows/ci.yml/badge.svg)](https://github.com/Medalcode/DocMCP/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/docmcp.svg)](https://pypi.org/project/docmcp/)
[![Python](https://img.shields.io/badge/python-3.11%2B-blue.svg)]()
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

Servidor MCP para manipulación de documentos PDF. Lee, fusiona, divide, comprime, extrae páginas e imágenes, y genera PDFs desde texto, tablas o informes estructurados.

## Features / Funcionalidades

| Tool / Herramienta | Description / Descripción |
|---|---|
| `read` | Lee un PDF y devuelve texto, metadatos y número de páginas |
| `info` | Obtiene metadatos, tamaño, páginas y conteo de imágenes |
| `extract_images` | Extrae todas las imágenes de un PDF a un directorio |
| `to_markdown` | Convierte el contenido del PDF a formato Markdown |
| `merge` | Fusiona múltiples PDFs en uno solo |
| `split` | Divide un PDF en páginas individuales |
| `extract_pages` | Extrae páginas específicas (ej: `1,3,5-10`) a un nuevo PDF |
| `compress` | Comprime un PDF reduciendo su tamaño |
| `generate_report` | Genera un PDF estructurado desde contenido JSON |
| `generate_table` | Crea un PDF con una tabla estilizada |
| `generate_text` | Convierte texto plano/Markdown a PDF |

## Tech Stack

- **Python** — `>=3.11`
- **Framework**: `mcp` (FastMCP) via stdio JSON-RPC
- **PDF reading**: `PyMuPDF` (fitz)
- **PDF manipulation**: `pypdf`
- **PDF generation**: `reportlab`

## Quick Start

```bash
# Instalar dependencias
pip install mcp pymupdf reportlab pypdf

# Ejecutar servidor (stdio transport)
python server.py

# Configurar directorio de trabajo (opcional)
export DOCMCP_WORKDIR=/home/user/documents
```

### Uso MCP Client

```python
from mcp import ClientSession, StdioServerParameters

async with ClientSession(server) as session:
    result = await session.call_tool("read", {"path": "documento.pdf"})
    result = await session.call_tool("merge", {"paths": "a.pdf,b.pdf", "output": "merged.pdf"})
    result = await session.call_tool("generate_report", {
        "title": "Reporte",
        "content": '{"sección": ["item1", "item2"]}',
        "output": "reporte.pdf"
    })
```

## Security / Seguridad

Path traversal protection via `DOCMCP_WORKDIR`. All file operations are restricted to the work directory, with trailing-slash prefix check to prevent `/home/user/evil` matching `/home/user/extra`.

## 🔧 Recent Improvements

- **Path Traversal Hardened** — `_resolve()` now normalizes relative paths against workdir and uses trailing-slash prefix check
- **`to_markdown()` Error Handling** — Output path validation errors are now caught and returned gracefully
- **`generate_table()` JSON Parsing** — Malformed `rows` JSON returns a friendly error instead of crashing
- **`merge()` Newline Support** — Accepts newline-separated paths in addition to comma-separated
- **`MAX_PAGES` Configurable** — `DOCMCP_MAX_PAGES` env var (default 100) controls PDF page limit
- **`import fitz` Moved** — Orphaned import at end of `generator.py` moved to top of file

## Project Structure

```
docmcp/
├── server.py              # MCP server entry point (tools)
├── docmcp/
│   ├── reader.py          # PDF reading & extraction
│   ├── manipulator.py     # Merge, split, compress, extract pages
│   ├── generator.py       # PDF generation (reports, tables, text)
│   └── __init__.py
├── client.py              # Test client
└── pyproject.toml
```
