Metadata-Version: 2.4
Name: epo-oa-cli
Version: 0.2.3
Summary: EPO patent prosecution history CLI — download, parse, and analyze European patent documents for AI-assisted analysis
Project-URL: Homepage, https://github.com/noaa/epo-oa-cli
Project-URL: Repository, https://github.com/noaa/epo-oa-cli
License: MIT
Keywords: ai,cli,epo,european-patent,office-action,patent,prosecution
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Legal Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.12
Requires-Dist: beautifulsoup4>=4.14.3
Requires-Dist: click>=8.0
Requires-Dist: pypdf>=6.0
Requires-Dist: requests>=2.34.2
Requires-Dist: rich>=15.0.0
Provides-Extra: ocr
Requires-Dist: ocrmypdf>=16.0; extra == 'ocr'
Description-Content-Type: text/markdown

# epo-oa-cli

**EPO patent prosecution history CLI** — Download, parse, and analyze European Patent Office (EPO) prosecution documents for AI-assisted patent analysis.

```bash
pip install epo-oa-cli
epo-oa run EP21841218
```

---

## Overview

`epo-oa` fetches the complete prosecution history of any EP patent from the [EPO Register](https://register.epo.org/), extracts PDF text (with optional OCR), and generates a structured `prosecution.md` file ready for AI analysis (Claude, GPT-4, etc.).

```
epo-oa run EP21841218
  → Downloads 40 documents as ZIP
  → Extracts & parses toc.xml
  → Generates file/EP21841218/EP21841218_prosecution.md
```

---

## Installation

```bash
pip install epo-oa-cli

# With OCR support (for image-based PDFs)
pip install "epo-oa-cli[ocr]"
```

Requires Python 3.12+.

---

## Quick Start

```bash
# 1. List all documents
epo-oa list EP21841218

# 2. Download as ZIP + extract
epo-oa download EP21841218

# 3. Parse PDFs → prosecution.md
epo-oa extract EP21841218

# 4. All-in-one
epo-oa run EP21841218
```

### With OCR (for image-based PDFs)

EPO PDFs are full-page image scans. Run OCR first to embed text into the analysis file:

```bash
# OCR key documents only
epo-oa ocr EP21841218 --codes 1703,1224,ABEX

# OCR all documents
epo-oa ocr EP21841218

# Extract with OCR text embedded
epo-oa extract EP21841218 --with-ocr
```

---

## Commands

| Command | Description |
|---------|-------------|
| `epo-oa list <EP>` | List prosecution documents from EPO Register |
| `epo-oa download <EP>` | Download all documents as ZIP archive |
| `epo-oa extract <EP>` | Parse PDFs → `prosecution.md` / `prosecution.json` |
| `epo-oa ocr <EP>` | OCR image-based PDFs → searchable `*_ocr.pdf` |
| `epo-oa run <EP>` | Download + extract in one step |
| `epo-oa configure` | Set proxy / CA-cert options |

### Options

```bash
epo-oa list EP21841218 --format json          # JSON output
epo-oa download EP21841218 --force            # Re-download
epo-oa extract EP21841218 --format json       # JSON output
epo-oa extract EP21841218 --with-ocr          # Embed OCR text
epo-oa ocr EP21841218 --codes 1703,ABEX       # Selective OCR
epo-oa ocr EP21841218 --in-place              # Overwrite originals
```

---

## Proxy & SSL Configuration

For corporate networks or environments that require a proxy or custom CA certificate:

```bash
epo-oa configure
```

This interactively prompts for:
- **HTTPS proxy URL** — e.g. `http://proxy.corp.example.com:8080`
- **HTTP proxy URL** — e.g. `http://proxy.corp.example.com:8080`
- **CA bundle file path** — path to a custom `.pem` / `.crt` file

Settings are saved to `~/.epo-oa.toml`:

```toml
[proxy]
https = "http://proxy.corp.example.com:8080"
http  = "http://proxy.corp.example.com:8080"

[ssl]
ca_bundle = "/etc/ssl/certs/corp-ca.pem"
```

If no config file exists, `requests` falls back to the standard environment variables (`HTTPS_PROXY`, `HTTP_PROXY`, `REQUESTS_CA_BUNDLE`).

---

## Output: `prosecution.md`

The generated markdown file is structured for AI agents:

```markdown
# EPO Prosecution Analysis — EP21841218

## Summary
| Item | Count |
|------|-------|
| Total documents | 40 |
| 🔴 Office Actions | 2 |
| 🔵 Amendments | 13 |
| ✅ Grant / Decision | 8 |

## Timeline
| Date | Cat | Document | File |
|------|-----|----------|------|
| 2023-10-30 | 🔍 | European Search Opinion (1703) 🖼️ | ... |
| 2024-02-15 | 🔵 | Amended Claims (CLMSABEX) 🖼️ | ... |
| 2026-02-05 | ✅ | Decision to Grant (2006A) 🖼️ | ... |

## 🔴 Office Action Documents
### European Search Opinion — 2023-10-30
**OCR Text:**
```text
D1 WO 2020/138918 A1 (SAMSUNG ELECTRONICS CO LTD)
1.1 D1 discloses an electronic device with the following features...
` `` `
```

---

## Politeness & Rate Limiting

This tool accesses a **public EPO server**. It enforces:
- Random delays (1.5–3.0s) between requests
- Browser-like headers
- ZIP archive download (minimises HTTP requests)

Please do not run this tool in tight loops or CI pipelines without appropriate throttling.

---

## Document Categories

| Icon | Category | Description |
|------|----------|-------------|
| 🔴 | Office Action | Examination notices, search opinions (1224, 1703, 2003–2006, etc.) |
| 🔵 | Amendment | Amendments, observations, responses (CLMSABEX, DESCABEX, ABEX, etc.) |
| ✅ | Grant | Grant decisions, certificates (2006A, 2066, 2047, etc.) |
| 🔍 | Search | Search reports (1503, 1503SS, ISR, IPRP, etc.) |
| 💬 | Interview | Interview summaries (INTERV, EXIN) |
| ⚪ | Other | Receipts, administrative notices, miscellaneous |

---

## Notes for AI Agents

- Image-only PDFs show `🖼️` — provide the `path` field directly to vision-capable models
- Run `epo-oa ocr` + `--with-ocr` to embed text for language models
- JSON output (`--format json`) includes full `path` and `text` fields for programmatic access
- The `prosecution.md` is designed to fit within typical LLM context windows for smaller dockets

---

## License

MIT
