Metadata-Version: 2.4
Name: agent-markitdown
Version: 0.1.1
Summary: Safe local document-to-markdown preprocessing for OpenClaw, Claude Code, Codex, Hermes, and other agents.
Project-URL: Homepage, https://github.com/TGambit65/agent-markitdown
Project-URL: Repository, https://github.com/TGambit65/agent-markitdown
Project-URL: Issues, https://github.com/TGambit65/agent-markitdown/issues
Project-URL: Changelog, https://github.com/TGambit65/agent-markitdown/blob/main/CHANGELOG.md
Author: Threshold
License: MIT License
        
        Copyright (c) 2026 Threshold
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: agents,claude-code,codex,docx,markdown,openclaw,pdf
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Text Processing :: Markup
Requires-Python: >=3.10
Requires-Dist: markitdown[docx,pdf,pptx,xlsx]>=0.1.5
Provides-Extra: dev
Requires-Dist: pytest>=8.3; extra == 'dev'
Requires-Dist: python-docx>=1.1.2; extra == 'dev'
Requires-Dist: reportlab>=4.4.1; extra == 'dev'
Description-Content-Type: text/markdown

# agent-markitdown

[![CI](https://github.com/TGambit65/agent-markitdown/actions/workflows/ci.yml/badge.svg)](https://github.com/TGambit65/agent-markitdown/actions/workflows/ci.yml)
[![Release](https://github.com/TGambit65/agent-markitdown/actions/workflows/release.yml/badge.svg)](https://github.com/TGambit65/agent-markitdown/actions/workflows/release.yml)

Safe local document-to-markdown preprocessing for agents.

Built for OpenClaw first, but intentionally usable from Claude Code, Codex, Hermes Agent, and anything else that can run a local CLI or Python package.

## What it is

`agent-markitdown` wraps Microsoft's excellent [`markitdown`](https://github.com/microsoft/markitdown) with an agent-oriented safety and workflow layer:

- local files only
- `convert_local()` only
- plugins off by default
- extension allowlist
- size guardrail
- deterministic JSON output
- extraction warnings when markdown may be incomplete
- review-pack generation for LLM handoff

## Why this exists

Raw file uploads are awkward for agent workflows.

For supported document types, agents usually work better when they receive clean markdown instead of a binary attachment or a heavyweight vision/PDF pass.

That means:

- lower context overhead
- easier quoting and summarization
- better portability across agent runtimes
- safer, narrower preprocessing than raw `markitdown convert()`

## What it is not

This package does **not** magically patch every agent runtime on earth.

It gives you a safe preprocessing layer plus integration assets. Each host agent still needs a tiny adapter or instruction layer telling it to run `agent-markitdown` before review.

OpenClaw gets a ready-made skill. Other agents get drop-in snippets.

## Status

- GitHub repo: live
- CI/release workflows: included
- PyPI publish path: ready once a token or trusted publisher is configured

## Installation

```bash
uv venv .venv
uv pip install --python .venv/bin/python .
# or with test/dev dependencies
uv pip install --python .venv/bin/python '.[dev]'
```

Or from PyPI later:

```bash
pip install agent-markitdown
```

## CLI

### Convert one file to stdout

```bash
agent-markitdown convert ./report.pdf
```

### Convert and emit JSON

```bash
agent-markitdown convert ./report.docx --json
```

JSON output includes a `warnings` array. It is empty for ordinary text extraction, and it calls out cases where the markdown should not be treated as complete, such as very low extracted text or image inputs that may need OCR/vision review.

### Write sidecar markdown files

```bash
agent-markitdown convert ./report.pdf ./notes.docx --sidecar
```

### Build one review bundle for an agent

```bash
agent-markitdown review-pack ./report.pdf ./notes.docx -o review-pack.md
```

### Health check

```bash
agent-markitdown doctor
```

## Supported extensions

- `.pdf`
- `.docx`
- `.pptx`
- `.xlsx`
- `.xls`
- `.html`, `.htm`
- `.csv`, `.tsv`
- `.json`, `.xml`
- `.txt`, `.md`, `.rtf`
- `.epub`
- `.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tif`, `.tiff`, `.webp`

## OpenClaw

See [`integrations/openclaw/SKILL.md`](integrations/openclaw/SKILL.md).

That skill tells OpenClaw to preprocess supported uploaded documents into markdown **before deeper review/summarization work**.

Install the OpenClaw skill into a workspace:

```bash
./scripts/install-openclaw-skill.sh
```

## Other agents

- Claude Code: [`integrations/claude-code/AGENTS.md`](integrations/claude-code/AGENTS.md)
- Codex: [`integrations/codex/AGENTS.md`](integrations/codex/AGENTS.md)
- Hermes Agent: [`integrations/hermes-agent/SKILL.md`](integrations/hermes-agent/SKILL.md)

For copyable host-side patterns, see:
- [`examples/review-pack-consumers/`](examples/review-pack-consumers/) for a generic review-pack handoff
- [`examples/auto-preprocess-adapters/`](examples/auto-preprocess-adapters/) for profile-specific prompt adapters that can sit in front of agent CLIs

## Security stance

This package intentionally avoids the broadest `markitdown` surfaces.

- no remote URLs
- no `convert()`
- no plugins unless explicitly enabled
- no ZIP traversal support
- explicit extension allowlist
- configurable size cap
- warnings for low-text extraction and image inputs that may need OCR/vision

If you're handling untrusted uploads in a server context, keep validating paths and storing uploads in a controlled temp area. This package narrows the blast radius; it does not replace sane host hygiene.

## Release flow

- CI runs on push/PR
- release workflow runs on `v*` tags
- tagged releases build wheel + sdist and attach them to a GitHub release
- PyPI publish is attempted automatically when either:
  - `PYPI_API_TOKEN` repo secret exists, or
  - `PYPI_TRUSTED_PUBLISHING=true` repo variable is set and PyPI trusted publishing is configured

See [`docs/publishing.md`](docs/publishing.md) and [`docs/release-checklist.md`](docs/release-checklist.md).

## Attribution

This project depends on and is inspired by Microsoft's `markitdown`, which is MIT licensed.
