Metadata-Version: 2.4
Name: dokumen-pintar
Version: 1.0.0
Summary: MCP server universal untuk CRUD dokumen lintas format (text, JSON, YAML, CSV, XML, DOCX, XLSX, PPTX, PDF) dengan versioning, sandboxed multi-root, search, batch, dan semantic search opsional.
Author-email: firdausmntp <firdausmntp@users.noreply.github.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/firdausmntp/Dokumen-Pintar
Project-URL: Repository, https://github.com/firdausmntp/Dokumen-Pintar.git
Project-URL: Issues, https://github.com/firdausmntp/Dokumen-Pintar/issues
Project-URL: Documentation, https://github.com/firdausmntp/Dokumen-Pintar/tree/main/docs
Keywords: mcp,documents,crud,office,pdf,search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp>=1.2.0
Requires-Dist: anyio>=4.4
Requires-Dist: pydantic>=2.7
Requires-Dist: platformdirs>=4.2
Requires-Dist: filelock>=3.15
Requires-Dist: charset-normalizer>=3.3
Requires-Dist: ruamel.yaml>=0.18
Requires-Dist: jsonpath-ng>=1.6
Requires-Dist: lxml>=5.2
Requires-Dist: pandas>=2.2
Requires-Dist: openpyxl>=3.1
Requires-Dist: python-docx>=1.1
Requires-Dist: python-pptx>=0.6.23
Requires-Dist: pypdf>=4.3
Requires-Dist: pdfplumber>=0.11
Requires-Dist: pikepdf>=9.0
Requires-Dist: reportlab>=4.2
Requires-Dist: rapidfuzz>=3.9
Requires-Dist: watchfiles>=0.22
Requires-Dist: starlette>=0.37
Requires-Dist: uvicorn>=0.30
Requires-Dist: sse-starlette>=2.1
Provides-Extra: semantic
Requires-Dist: sentence-transformers>=3.0; extra == "semantic"
Requires-Dist: numpy>=1.26; extra == "semantic"
Requires-Dist: scikit-learn>=1.5; extra == "semantic"
Provides-Extra: dev
Requires-Dist: pytest>=8.2; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Requires-Dist: ruff>=0.5; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Dynamic: license-file

# Dokumen-Pintar

**Universal MCP server for cross-format document CRUD**

Read, write, search, and manage text, Office, and PDF files
from any AI agent that supports the [Model Context Protocol](https://modelcontextprotocol.io/).

[![PyPI](https://img.shields.io/pypi/v/dokumen-pintar)](https://pypi.org/project/dokumen-pintar/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-3776AB?logo=python&logoColor=white)](https://python.org)
[![License: MIT](https://img.shields.io/badge/license-MIT-1e3a5f)](LICENSE)
[![Tests: 730 passed](https://img.shields.io/badge/tests-730%20passed-10b981?logo=pytest&logoColor=white)](tests/)
[![Coverage: 100%](https://img.shields.io/badge/coverage-100%25-10b981)](htmlcov/)

---

## Features

- **Multi-root Sandbox** — Define multiple workspace roots with per-root `writable` control. All paths outside the sandbox are rejected.
- **10 Formats** — Plain text, Markdown, JSON, YAML, CSV/TSV, XML/SVG, DOCX, XLSX, PPTX, PDF.
- **30 MCP Tools** — File & content CRUD, structured access, batch operations, search, versioning — all exposed as callable tools for AI agents.
- **Automatic Versioning** — Copy-on-write snapshots on every write operation. Undo, diff, restore, and purge anytime.
- **Structured Access** — JSONPath for JSON/YAML, XPath for XML, cell/range/sheet for XLSX, paragraph/table for DOCX, slide for PPTX, page for PDF.
- **Batch Operations** — Mass rename, find-and-replace, and delete with dry-run by default.
- **Semantic Search** *(optional)* — Vector search powered by sentence-transformers; enable via config.
- **Audit Trail** — Every mutation logged to JSONL with timestamp and operation details.
- **2 Transports** — stdio (Claude Desktop, Cursor, VS Code, Windsurf) and HTTP/SSE.

---

## Supported Formats

| Format | Read | Write | Structured Query | Search |
|:-------|:----:|:-----:|:-----------------|:------:|
| Plain text / Markdown | Y | Y | — | Y |
| JSON | Y | Y | JSONPath `$.key` | Y |
| YAML | Y | Y | JSONPath `$.key` | Y |
| CSV / TSV | Y | Y | `row:N` `col:N` `cell:R,C` | Y |
| XML / SVG | Y | Y | XPath `//node` | Y |
| DOCX | Y | Y | `paragraph:N` `table:N` | Y |
| XLSX | Y | Y | `cell:Sheet!A1` `range:` `sheet:` | Y |
| PPTX | Y | Y | `slide:N` `slide_title:N` | Y |
| PDF | Y | — | `page:N` `outline` `metadata` | Y |

---

## Quick Start

### 1. Install

```bash
pip install dokumen-pintar
```

With semantic search:

```bash
pip install dokumen-pintar[semantic]
```

### 2. Create a Config

```bash
dokumen-pintar-init
```

Or create one manually:

```json
{
  "roots": [
    { "name": "documents", "path": "~/Documents", "writable": true },
    { "name": "projects",  "path": "~/Projects",  "writable": true }
  ]
}
```

### 3. Run

```bash
dokumen-pintar --config dokumen-pintar.config.json
```

### 4. Connect to an AI Client

**Claude Desktop** — Add to `claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "dokumen-pintar": {
      "command": "dokumen-pintar",
      "args": ["--config", "/path/to/dokumen-pintar.config.json"]
    }
  }
}
```

**Cursor / VS Code / Windsurf** — Use the same stdio transport. Point your IDE's MCP settings to the `dokumen-pintar` command and config path.

---

## Tools Overview

**30 MCP tools** organized by category:

| Category | Tools |
|:---------|:------|
| Workspace | `workspace_list_roots` `workspace_stat` `workspace_tree` |
| File CRUD | `file_create` `file_delete` `file_rename` `file_copy` `file_move` |
| Content | `content_read` `content_write` `content_append` `content_insert` `content_replace` `content_patch` |
| Structured | `structured_get` `structured_set` `structured_delete` `structured_meta` |
| Batch | `batch_rename` `batch_replace_content` `batch_delete` |
| Search | `search_filename` `search_content` `search_in_format` |
| Versioning | `version_list` `version_diff` `version_restore` `version_undo` `version_purge` |
| Semantic\* | `semantic_index` `semantic_search` |

\*Only available when `semantic_search.enabled = true` and `[semantic]` extras are installed.

---

## Documentation

Full docs on GitHub: [github.com/firdausmntp/Dokumen-Pintar](https://github.com/firdausmntp/Dokumen-Pintar)

- **USAGE.md** — Workspace URIs, tool examples, practical recipes
- **CONFIG.md** — All config fields with types, defaults, and notes
- **TOOLS.md** — Full reference for all 30 tools
- **ARCHITECTURE.md** — Module map, request flow, versioning, safety

---

## License

[MIT](https://github.com/firdausmntp/Dokumen-Pintar/blob/main/LICENSE) — 2026 [firdausmntp](https://github.com/firdausmntp)
