Metadata-Version: 2.4
Name: awb-extractor
Version: 0.1.3
Summary: Extract recipient address from AWB/shipping label PDF using Claude AI
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: anthropic>=0.40.0
Requires-Dist: httpx>=0.27.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-mock>=3.14; extra == "dev"

# AWB Extractor

Python SDK for extracting receiver and shipment information from AWB/shipping
label PDF files using Claude AI.

## Features

- Extract from PDF bytes, local PDF files, or PDF URLs
- Batch extraction from multiple URLs
- Optional default HTTP headers for protected AWB URLs
- Typed `AWBResult` dataclass output
- Custom exceptions for API key, PDF download, and JSON parsing failures

## Requirements

- Python 3.9+
- Anthropic API key

## Installation

Install from PyPI:

```bash
pip install awb-extractor
```

For local development:

```bash
pip install -e ".[dev]"
```

## Usage

```python
from awb_extractor import AWBExtractor

extractor = AWBExtractor(api_key="sk-ant-...")
result = extractor.from_file("label.pdf")

print(result.recipient_name)
print(result.to_dict())
```

Example result:

```python
{
    "tracking_number": "NHSVC972103440",
    "recipient_name": "Nguyen Van A",
    "recipient_phone": "(+84)03******37",
    "recipient_address": "237 Nguyen Trai",
    "recipient_ward": "Phuong Ben Thanh",
    "recipient_district": "Quan 1",
    "recipient_province": "TP. Ho Chi Minh",
    "sender_name": "Onflow",
    "sender_address": "TP. Ho Chi Minh",
    "cod": "0",
    "weight": "0.700 KG",
    "order_id": "584425059595159079",
}
```

## Supported Inputs

### PDF bytes

```python
from awb_extractor import AWBExtractor

extractor = AWBExtractor(api_key="sk-ant-...")

with open("label.pdf", "rb") as file:
    result = extractor.from_bytes(file.read())
```

### Local PDF file

```python
from awb_extractor import AWBExtractor

extractor = AWBExtractor(api_key="sk-ant-...")
result = extractor.from_file("label.pdf")
```

### PDF URL

```python
from awb_extractor import AWBExtractor

extractor = AWBExtractor(
    api_key="sk-ant-...",
    http_headers={"Authorization": "Bearer token"},
)

result = extractor.from_url("https://example.com/awb.pdf")
```

You can pass request-specific headers with `extra_headers`:

```python
result = extractor.from_url(
    "https://example.com/awb.pdf",
    extra_headers={"X-Request-ID": "request-123"},
)
```

### Multiple URLs

`from_urls()` returns a list of dictionaries with `url`, `data`, and `error`.
Failed URLs do not stop the whole batch.

```python
from awb_extractor import AWBExtractor

extractor = AWBExtractor(api_key="sk-ant-...")
results = extractor.from_urls([
    "https://example.com/good.pdf",
    "https://example.com/bad.pdf",
])
```

## Result Fields

`AWBResult` includes:

- `tracking_number`
- `recipient_name`
- `recipient_phone`
- `recipient_address`
- `recipient_ward`
- `recipient_district`
- `recipient_province`
- `sender_name`
- `sender_address`
- `cod`
- `weight`
- `order_id`

Use `to_dict()` or `to_json()` to serialize the result.

## Exceptions

- `APIKeyError`: missing API key
- `PDFDownloadError`: PDF URL download failed
- `ExtractionError`: Claude response could not be parsed as JSON

## Package Structure

- `awb_extractor/extractor.py`: public `AWBExtractor` class
- `awb_extractor/models.py`: `AWBResult` dataclass
- `awb_extractor/exceptions.py`: package exceptions

## Publishing

GitHub Actions builds and publishes the package to PyPI on every push to `main`.

The repository must define this GitHub secret:

```text
PYPI_API_TOKEN
```

PyPI does not allow replacing an existing version. If a commit on `main` does not
bump `project.version` in `pyproject.toml`, the publish step skips the existing
distribution.
