Metadata-Version: 2.4
Name: docira-mcp
Version: 0.1.0
Summary: MCP server for Docira — multi-model OCR / document-to-Markdown/JSON. Thin client over the hosted Docira API; needs only an API key.
Author: Docira
License: MIT
Project-URL: Homepage, https://docira.io
Project-URL: Documentation, https://docira.io/docs
Keywords: mcp,ocr,document-parsing,docira,claude
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: mcp[cli]>=1.2.0
Requires-Dist: httpx>=0.27

# docira-mcp

MCP server for [Docira](https://docira.io) — multi-model OCR / document-to-Markdown/JSON.

A **thin client** over the hosted Docira API: it needs only your API key, no
local model stack. Works with any MCP client (Claude Desktop, agents).

## Tools

| Tool | What it does |
|------|--------------|
| `parse_document(file_url, operation_mode, max_pages)` | Parse a doc at a URL → Markdown |
| `extract_structured(file_url, json_schema, max_pages)` | Schema-driven structured JSON (invoices, receipts, forms) |
| `parse_local_file(file_path, operation_mode)` | Parse a local file (multipart upload) → Markdown |
| `get_result(result_id)` | Retrieve an async/batch result by id |
| `health()` | API + provider status |

## Claude Desktop

Add to `claude_desktop_config.json` (create a key in the Docira dashboard → Keys):

```json
{
  "mcpServers": {
    "docira": {
      "command": "uvx",
      "args": ["docira-mcp"],
      "env": { "DOCIRA_API_KEY": "pw_live_…" }
    }
  }
}
```

Before publishing to PyPI, run it straight from the repo:

```json
{
  "mcpServers": {
    "docira": {
      "command": "uvx",
      "args": ["--from", "/abs/path/to/ParseWave/clients/docira-mcp", "docira-mcp"],
      "env": { "DOCIRA_API_KEY": "pw_live_…" }
    }
  }
}
```

## Config (env)

| Var | Default | Notes |
|-----|---------|-------|
| `DOCIRA_API_KEY` | — (required) | Your `pw_live_…` / `pw_test_…` key |
| `DOCIRA_API_URL` | `https://parsewave-api.fly.dev` | API base |
| `DOCIRA_TIMEOUT` | `300` | Per-request seconds |

## Notes

- `extract_structured` is the schema-driven path (returns `content_json`); it
  routes server-side to a structured-extraction-capable model.
- Responses are model-agnostic — `provider` is reported as `"docira"`.
