Metadata-Version: 2.4
Name: reducto-cli
Version: 0.1.4
Summary: CLI for Reducto document processing
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: async-typer>=0.1.10
Requires-Dist: reductoai>=0.13.0
Requires-Dist: typer>=0.20.0

# Reducto CLI

[![PyPI version](https://img.shields.io/pypi/v/reducto-cli.svg)](https://pypi.org/project/reducto-cli/)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/pypi/l/reducto-cli.svg)](https://pypi.org/project/reducto-cli/)

A command-line tool for document parsing, structured data extraction, and document editing — powered by [Reducto](https://reducto.ai)'s document intelligence API.

Parse PDFs, images, spreadsheets, and Office documents into clean Markdown. Extract structured JSON using schemas. Edit documents with natural language instructions. Process single files or entire directories.

**[Documentation](https://docs.reducto.ai)** | **[Reducto Studio](https://studio.reducto.ai)** | **[API Quickstart](https://docs.reducto.ai/quickstart)** | **[Python SDK](https://github.com/reductoai/reducto-python-sdk)** | **[Claude Code Plugin](https://github.com/reductoai/claude-plugins)**

---

## Table of Contents

- [Installation](#installation)
- [Authentication](#authentication)
- [Quick Start](#quick-start)
- [Commands](#commands)
  - [parse](#parse-command)
  - [extract](#extract-command)
  - [edit](#edit-command)
- [Supported File Types](#supported-file-types)
- [Use Cases](#use-cases)
- [How It Works](#how-it-works)
- [Configuration](#configuration)
- [Related Projects](#related-projects)

---

## Installation

```bash
pip install reducto-cli
```

Requires Python 3.11 or later.

## Authentication

Authenticate using the built-in device code flow, which opens a browser to [Reducto Studio](https://studio.reducto.ai):

```bash
reducto login
```

This saves your API key to `~/.reducto/config.yaml`.

Alternatively, set the `REDUCTO_API_KEY` environment variable directly:

```bash
export REDUCTO_API_KEY="your_api_key_here"
```

Get an API key by signing up at [studio.reducto.ai](https://studio.reducto.ai).

## Quick Start

```bash
# Parse a PDF into Markdown
reducto parse invoice.pdf

# Parse an entire folder of documents
reducto parse ./contracts/

# Extract structured data using a JSON Schema
reducto extract invoice.pdf -s schema.json

# Edit a document with natural language
reducto edit form.pdf -i "Fill in the client name as 'Acme Corp'"
```

---

## Commands

### Parse Command

Converts documents into structured Markdown, preserving layout, tables, and figures. Uses [Reducto's Parse API](https://docs.reducto.ai) with agentic OCR and vision-language models.

```bash
reducto parse <path> [options]
```

Output is written to `<filename>.parse.md` with YAML front matter containing the job ID and processing duration.

#### Options

| Flag | Description |
|------|-------------|
| `--agentic` | Enables agentic processing for tables, text, and figures. Higher accuracy, higher latency. Use for complex layouts or low-quality scans. |
| `--change-tracking` | Returns `<s>`, `<u>`, and `<change>` tags for strikethrough, underlined, and revised text. Useful for contracts and legal redlines. |
| `--highlights` | Include highlighted text in output. |
| `--hyperlinks` | Include embedded hyperlinks in output. |
| `--comments` | Include document comments in output. |

#### Examples

```bash
# Basic parse
reducto parse document.pdf

# High-accuracy parse for complex layouts
reducto parse scanned_report.pdf --agentic

# Parse a contract with revision tracking
reducto parse contract.pdf --change-tracking

# Parse with all metadata preserved
reducto parse document.pdf --hyperlinks --comments --highlights

# Combine flags
reducto parse legal_doc.pdf --agentic --change-tracking --comments
```

---

### Extract Command

Pulls structured data from documents according to a [JSON Schema](https://json-schema.org/) you provide. Maps unstructured content — invoices, receipts, forms, contracts, financial statements — into machine-readable JSON.

```bash
reducto extract <path> --schema <schema>
```

The schema can be a path to a `.json` file or an inline JSON string. Output is saved as `<filename>.extract.json`.

The CLI automatically reuses existing parse results: if a `.parse.md` file exists for a document, its recorded job ID is used via `jobid://` references to skip re-parsing.

#### Schema Requirements

- Must be a valid JSON Schema document.
- The top-level type **must** be `object` — arrays and primitives are not permitted at the top level.
- Schemas can be provided as file paths or inline JSON strings.

#### Example Schema

```json
{
  "type": "object",
  "properties": {
    "vendor_name": { "type": "string" },
    "invoice_number": { "type": "string" },
    "date": { "type": "string" },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "number" },
          "unit_price": { "type": "number" },
          "total": { "type": "number" }
        },
        "required": ["description", "quantity", "unit_price", "total"]
      }
    },
    "total_amount": { "type": "number" }
  },
  "required": ["vendor_name", "invoice_number", "line_items", "total_amount"]
}
```

#### Examples

```bash
# Extract using a schema file
reducto extract invoice.pdf -s schemas/invoice.json

# Extract from a folder of invoices
reducto extract ./invoices/ -s schemas/invoice.json

# Extract with inline JSON schema
reducto extract receipt.pdf -s '{"type":"object","properties":{"total":{"type":"number"},"date":{"type":"string"}},"required":["total","date"]}'
```

---

### Edit Command

Modifies documents using natural language instructions. Uploads the document, applies edits via the [Reducto Edit API](https://docs.reducto.ai), and downloads the result.

```bash
reducto edit <path> --instructions "<instructions>"
```

Edited files are saved as `<filename>.edited.<extension>` (e.g., `form.pdf` becomes `form.edited.pdf`).

| Parameter | Required | Description |
|-----------|----------|-------------|
| `path` | Yes | Path to a file or directory. |
| `--instructions`, `-i` | Yes | Natural language instructions for the edits. |

#### Examples

```bash
# Fill out a PDF form
reducto edit application.pdf -i "Fill in: Name: Jane Smith, Date: 2025-03-15, check 'Agree to terms'"

# Update a contract
reducto edit contract.pdf -i "Fill in the client name as 'Acme Corporation' and set the effective date to January 15, 2025"

# Batch edit a folder of forms
reducto edit ./forms/ -i "Set the company name to 'Globex Inc' in all header fields"
```

#### Tips for Effective Instructions

- Be specific about which elements to modify (headers, tables, specific fields).
- Reference content by name or position when possible.
- Describe the desired outcome, not the process.
- For batch operations, write instructions that apply uniformly across all files.

---

## Supported File Types

| Category | Extensions |
|----------|------------|
| PDF | `.pdf` |
| Images | `.png`, `.jpg`, `.jpeg` |
| Office Documents | `.doc`, `.docx`, `.ppt`, `.pptx` |
| Spreadsheets | `.xls`, `.xlsx`, `.numbers` |

All commands accept a single file or a directory. Directories are scanned recursively and only supported file types are processed. Generated output files (`.parse.md`, `.extract.json`) are automatically excluded from processing.

---

## Use Cases

### Invoice and Receipt Processing

Parse invoices from any vendor format, then extract line items, totals, and payment details into structured JSON for your accounting pipeline.

```bash
reducto parse ./invoices/
reducto extract ./invoices/ -s schemas/invoice.json
```

### Contract and Legal Document Review

Parse contracts with change tracking to surface redlines and revisions. Extract key clauses, dates, and party names for contract management systems.

```bash
reducto parse contract.pdf --agentic --change-tracking --comments
reducto extract contract.pdf -s schemas/contract_terms.json
```

### Form Processing and Auto-Fill

Edit PDF and DOCX forms programmatically — fill fields, check boxes, and populate tables without manual data entry.

```bash
reducto edit onboarding_form.pdf -i "Fill in employee name: Alex Chen, start date: 2025-04-01, department: Engineering, select 'Full-time' for employment type"
```

### Financial Statement Analysis

Extract tables and figures from bank statements, earnings reports, and tax documents into structured data for financial modeling.

```bash
reducto extract quarterly_report.pdf -s schemas/financial_statement.json
```

### Medical and Insurance Document Processing

Parse lab reports, claims forms, and patient intake documents. Reducto is [HIPAA compliant](https://docs.reducto.ai/security/policies) for healthcare workflows.

```bash
reducto parse lab_results.pdf --agentic
reducto extract claim_form.pdf -s schemas/insurance_claim.json
```

### Batch Document Digitization

Convert entire folders of scanned documents, presentations, and spreadsheets into searchable Markdown for knowledge bases or RAG pipelines.

```bash
reducto parse ./legacy_docs/ --agentic
```

### Feeding Data to LLM Pipelines

Parse documents into clean Markdown optimized for LLM consumption, then use the structured output as context for retrieval-augmented generation (RAG) systems.

```bash
# Parse into LLM-ready Markdown
reducto parse ./knowledge_base/

# Or extract specific fields for structured RAG
reducto extract ./knowledge_base/ -s schemas/document_metadata.json
```

---

## How It Works

1. **Upload** — The CLI uploads your document to Reducto's API.
2. **Process** — Reducto applies agentic OCR, layout detection, and vision-language models to understand document structure.
3. **Return** — Parsed Markdown, extracted JSON, or edited documents are downloaded to your local filesystem.

Files within a directory are processed concurrently. Parse results are cached locally (`.parse.md` files with job IDs), so subsequent extract commands skip re-parsing.

---

## Configuration

| Method | Details |
|--------|---------|
| Device code login | `reducto login` — opens browser, saves key to `~/.reducto/config.yaml` |
| Environment variable | `export REDUCTO_API_KEY="your_key"` — takes precedence over saved config |
| Manual entry | The CLI prompts for manual key entry as a fallback |

The config file is stored at `~/.reducto/config.yaml` with `0600` permissions.

---

## Related Projects

| Project | Description |
|---------|-------------|
| [Reducto Python SDK](https://github.com/reductoai/reducto-python-sdk) | Full Python client for the Reducto API (`pip install reductoai`) |
| [Reducto Node.js SDK](https://github.com/reductoai/reducto-node-sdk) | Node.js client for the Reducto API (`npm install reductoai`) |
| [Reducto Go SDK](https://github.com/reductoai/reducto-go-sdk) | Go client for the Reducto API |
| [Reducto Claude Code Plugins](https://github.com/reductoai/claude-plugins) | Official Reducto plugins for Claude Code |
| [Reducto Studio](https://studio.reducto.ai) | No-code web interface for document processing |

---

## Resources

- [Reducto Documentation](https://docs.reducto.ai) — API reference, guides, and tutorials
- [API Quickstart](https://docs.reducto.ai/quickstart) — Get started with the Reducto API
- [Security & Compliance](https://docs.reducto.ai/security/policies) — SOC 2 Type II, HIPAA, and data handling policies
- [Reducto Website](https://reducto.ai) — Product overview and company information
- [PyPI Package](https://pypi.org/project/reducto-cli/) — Package registry listing
