Metadata-Version: 2.4
Name: uspto-oa-cli
Version: 0.1.9
Summary: USPTO 특허 심사과정 분석 CLI — 문서 다운로드 · XML 파싱 · MD 생성
Project-URL: Homepage, https://github.com/noaa/odp_oa_cli
License: MIT
Keywords: cli,office-action,patent,prosecution,uspto
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Legal Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Requires-Python: >=3.12
Requires-Dist: click>=8.0
Requires-Dist: ocrmypdf>=16.0
Requires-Dist: pypdf>=6.11.0
Requires-Dist: requests>=2.34.2
Requires-Dist: rich>=15.0.0
Description-Content-Type: text/markdown

# uspto-oa-cli

A CLI tool that downloads USPTO patent prosecution documents via the ODP (Open Data Portal) API, parses the XML, and converts them into structured Markdown.

Supports a workflow where the generated MD file is passed to AI agents (Claude Code, Gemini CLI, etc.) for prosecution strategy analysis.

## Requirements

- Python 3.12+
- [uv](https://docs.astral.sh/uv/)
- USPTO API key (issued via the [ODP portal](https://developer.uspto.gov/))

## Installation

```bash
# pip
pip install uspto-oa-cli

# uv (global install)
uv tool install uspto-oa-cli

# uv (add as project dependency)
uv add uspto-oa-cli

# local development
uv sync
```

## API Key Setup

```bash
# Interactive setup (recommended) — saved to ~/.oa-cli.toml
# Prompts for: API key, HTTPS/HTTP proxy URL, CA bundle path
uspto-oa configure

# Show current configuration
uspto-oa configure --show
```

Or set via environment variable:

```bash
export USPTO_API_KEY=your_api_key_here
```

### Proxy & SSL (Corporate Networks)

If you're behind a corporate proxy or need a custom CA bundle, set them during `configure` or edit `~/.oa-cli.toml` directly:

```toml
[auth]
api_key = "YOUR_KEY"

[proxy]
https = "http://proxy.example.com:8080"
http  = "http://proxy.example.com:8080"

[ssl]
ca_bundle = "/path/to/corporate-ca.pem"
```

These settings are applied to all HTTP requests. If omitted, the standard `requests` library environment-variable fallback (`HTTPS_PROXY`, `HTTP_PROXY`, `REQUESTS_CA_BUNDLE`) applies.

## Usage

```bash
# 0. Check document list before downloading
uspto-oa list 16330077

# 1. Download documents (saved to file/{app_num}/)
uspto-oa download 16330077

# 2. Parse XML → generate prosecution.md
uspto-oa extract 16330077
# Output: file/16330077/16330077_prosecution.md

# Extract in JSON format
uspto-oa extract 16330077 --format json

# Filter by date range and sort newest-first
uspto-oa extract 16330077 --from 2022-01-01 --to 2022-12-31 --sort desc

# 3. (Optional) OCR image-based PDFs → searchable PDFs
uspto-oa ocr 16330077

# 4. (Optional) Embed OCR text into prosecution.md for AI analysis
#    Run after step 3. Selectively include high-value doc codes to
#    avoid filling up the AI context window.
uspto-oa extract 16330077 --with-ocr --ocr-codes CTNF,CTFR,REM,EXIN,CTAV

# Download specific document codes only
uspto-oa download 16330077 --doc-codes CTNF,CTFR,NOA

# Force re-download (overwrite existing files)
uspto-oa download 16330077 --force

# Verbose logging
uspto-oa -v download 16330077

# One-time API key override
uspto-oa download 16330077 --api-key YOUR_KEY
```

### Command Options

**`uspto-oa list <application>`**

| Option | Description |
|--------|-------------|
| `--all` | Show all documents without prosecution-related filter |
| `--format [table\|json]` | Output format (default: `table`) |
| `--api-key TEXT` | API key |

**`uspto-oa download <application>`**

| Option | Description |
|--------|-------------|
| `--doc-codes CODES` | Comma-separated document codes (e.g. `CTNF,CTFR,NOA`). All prosecution docs if omitted |
| `--output-dir DIR` | Save path (default: `file/{app_num}/`) |
| `--force` | Re-download even if file already exists |
| `--api-key TEXT` | API key (overrides config file and environment variable) |

**`uspto-oa extract <application>`**

| Option | Description |
|--------|-------------|
| `--format [md\|json]` | Output format (default: `md`) |
| `--output-dir DIR` | File directory (default: `file/{app_num}/`) |
| `--with-ocr` | Embed OCR text from `*_ocr.pdf` files into prosecution.md (run `ocr` first) |
| `--ocr-codes CODES` | Comma-separated doc codes to embed (default: `CTNF,CTFR,NOA,NACT,EXIN,REM,CTAV` + `A*`) |
| `--from YYYY-MM-DD` | Include only documents on or after this date |
| `--to YYYY-MM-DD` | Include only documents on or before this date |
| `--sort [asc\|desc]` | Sort timeline by date (default: `asc`) |

**Doc code guide for `--ocr-codes`** — choosing the right codes prevents AI context overflow:

| Code | Description | OCR value | Default included |
|------|-------------|:---------:|:----------------:|
| `CTNF` | Non-Final Office Action | High — core rejection grounds | ✓ |
| `CTFR` | Final Office Action | High — core rejection grounds | ✓ |
| `NOA` / `NACT` | Notice of Allowance | Medium — allowance reasons | ✓ |
| `EXIN` | Examiner Interview Summary | High — often PDF-only in modern apps | ✓ |
| `REM` | Remarks (applicant arguments) | High — often PDF-only | ✓ |
| `CTAV` | Advisory Action | Medium — examiner's response to after-final amendment | ✓ |
| `A*` | All Amendment variants | High — when XML parsing fails | ✓ |
| `ABN` | Abandonment | Low | — |
| `RCE` / `RCEX` | Request for Continued Examination | Low | — |
| `SRNT` / `SRFW` | Search Report | Low — very long, little analysis value | — |
| `892` / `1449` / `IDS` | Prior Art / IDS | Low — very long, reference lists | — |

**`uspto-oa ocr <application>`**

USPTO PDF documents are full-page image scans — standard text extraction fails. This command runs OCR on every PDF in the application directory and produces searchable PDFs alongside the originals.

> **Note:** `ocrmypdf` is bundled as a default dependency, but requires system packages to function: **Tesseract OCR** and **Ghostscript**. Install them once with your OS package manager before running this command.
>
> **macOS**
> ```bash
> brew install tesseract ghostscript
> ```
>
> **Windows** (choose one)
> ```powershell
> # winget (built-in, Windows 10/11)
> winget install -e --id UB-Mannheim.TesseractOCR
> winget install -e --id ArtifexSoftware.Ghostscript
>
> # Chocolatey
> choco install tesseract ghostscript
> ```
>
> **Linux (Debian/Ubuntu)**
> ```bash
> sudo apt install tesseract-ocr ghostscript
> ```
>
> After installation, reopen your terminal so the new commands are on `PATH`.

| Option | Description |
|--------|-------------|
| `--force` | Re-OCR even if output already exists |
| `--in-place` | Overwrite original PDFs instead of creating `*_ocr.pdf` copies |
| `--no-deskew` | Skip deskew correction (faster) |
| `--output-dir DIR` | File directory (default: `file/{app_num}/`) |

```bash
# Run OCR (creates {original}_ocr.pdf next to each PDF)
uspto-oa ocr 16330077

# Overwrite originals in place
uspto-oa ocr 16330077 --in-place
```

## Workflow

```
uspto-oa list {app_num}               # Check document list before downloading
    └─ Browse prosecution document codes and formats

uspto-oa download {app_num}
    └─ Save XML / PDF to file/{app_num}/

uspto-oa extract {app_num}            # XML-only (fast, default)
    └─ Generate file/{app_num}/{app_num}_prosecution.md
         └─ AI agent (Claude Code / Gemini CLI)
              └─ Prosecution strategy analysis, summaries, Q&A

# ── Optional: include PDF-only documents in prosecution.md ──────────────────

uspto-oa ocr {app_num}                # Step A: OCR all PDFs → *_ocr.pdf
    └─ Generate {original}_ocr.pdf next to each PDF

uspto-oa extract {app_num} \          # Step B: embed selected OCR text
    --with-ocr \
    --ocr-codes CTNF,CTFR,REM,EXIN,CTAV
    └─ prosecution.md now includes full text of selected PDF documents
         └─ AI agent reads one file, gets complete prosecution history
```

## Collected Document Codes

| Code | Description |
|------|-------------|
| `CTNF` | Non-Final Office Action |
| `CTFR` | Final Office Action |
| `NOA` / `NACT` | Notice of Allowance |
| `REM` | Remarks |
| `ABN` | Abandonment |
| `SRNT` / `SRFW` | Search Report |
| `EXIN` | Examiner Interview |
| `RCE` / `RCEX` | Request for Continued Examination |
| `CTAV` | Advisory Action |
| `892` / `1449` / `IDS` | Prior Art / IDS |
| `A*` | All Amendment variants |

## Generated File Structure

`file/{app_num}/{app_num}_prosecution.md`:

| Section | Content |
|---------|---------|
| Timeline | All documents sorted by date (XML/PDF format shown) |
| Office Action Details | Full rejection grounds from CTNF/CTFR |
| Amendment Details | Amended claims (CLM) + Remarks (REM) |
| Examiner Interview Details | Full EXIN text |
| Notice of Allowance Details | Allowed claims + Examiner's Statement |
| PDF-only Documents | Image PDF list (for direct AI agent delivery) |

## PyPI Release

```bash
uv build
uv run twine upload dist/*
```
