Metadata-Version: 2.4
Name: sunny-address-normalization
Version: 1.0.0
Summary: US address parsing and normalization library using libpostal
Author: Ayush Tomar
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: postal>=1.1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: wheel>=0.40.0; extra == "dev"
Dynamic: requires-python

# USPS Address Normalizer

A Python library for parsing and normalizing US addresses to USPS standard format using libpostal's intelligent parsing.

## Features

✅ **Intelligent Parsing** - Uses libpostal ML model to understand address components
✅ **USPS Compliant** - Applies all USPS Publication 28 standardization rules
✅ **Complete Abbreviations** - Street suffixes, directionals, secondary units, states
✅ **Separate Components** - Returns city, state, and ZIP as individual fields
✅ **Proper Line Formatting** - Separates primary (Line 1) and secondary (Line 2) addressing
✅ **Clean Output** - ALL CAPS, no periods, normalized spacing

## Installation

### Prerequisites

Install libpostal C library:
```bash
brew install libpostal
```

### Install the Package

```bash
# From wheel file
pip install usps_address_normalizer-1.0.0-py3-none-any.whl

# Or from source
pip install .
```

## Usage

### Basic Usage

```python
from usps_address_normalizer import normalize_address

# Normalize any address format
address = normalize_address("128 king street floor 3 suite 301 san francisco california 94107")

# Access individual components
print(address.line1)           # 128 KING ST
print(address.line2)           # FL 3 STE 301
print(address.city)            # SAN FRANCISCO
print(address.state)           # CA
print(address.zip_code)        # 94107
```

### Computed Properties

```python
# Get combined city, state, ZIP
print(address.city_state_zip)
# Output: SAN FRANCISCO, CA 94107

# Get full formatted address
print(address.full_address)
# Output:
# 128 KING ST
# FL 3 STE 301
# SAN FRANCISCO, CA 94107
```

### Dictionary Export

```python
# Convert to dictionary
data = address.as_dict()
print(data)
# Output:
# {
#     'line1': '128 KING ST',
#     'line2': 'FL 3 STE 301',
#     'city': 'SAN FRANCISCO',
#     'state': 'CA',
#     'zip_code': '94107',
#     'city_state_zip': 'SAN FRANCISCO, CA 94107',
#     'full_address': '128 KING ST\nFL 3 STE 301\nSAN FRANCISCO, CA 94107'
# }
```

## Examples

### Simple Address (No Unit)

```python
address = normalize_address("100 Market Street San Francisco CA 94105")

print(address.line1)    # 100 MARKET ST
print(address.line2)    # (empty string)
print(address.city)     # SAN FRANCISCO
print(address.state)    # CA
print(address.zip_code) # 94105
```

### Complex Address with Multiple Units

```python
address = normalize_address("128 N King St, Floor 3, Suite 301, San Francisco, CA 94107")

print(address.line1)    # 128 N KING ST
print(address.line2)    # FL 3 STE 301
print(address.city)     # SAN FRANCISCO
print(address.state)    # CA
print(address.zip_code) # 94107
```

### With Full State Name

```python
address = normalize_address("456 Main Avenue Apartment 2B New York New York 10001")

print(address.line1)    # 456 MAIN AVE
print(address.line2)    # APT 2B
print(address.city)     # NEW YORK
print(address.state)    # NY (automatically converted)
print(address.zip_code) # 10001
```

### PO Box

```python
address = normalize_address("PO Box 1234 San Francisco CA 94107")

print(address.line1)    # PO BOX 1234
print(address.line2)    # (empty string)
print(address.city)     # SAN FRANCISCO
print(address.state)    # CA
print(address.zip_code) # 94107
```

## USPS Standardization Rules

The library applies all official USPS abbreviations:

### Street Suffixes
- STREET → ST
- AVENUE → AVE
- BOULEVARD → BLVD
- DRIVE → DR
- And 60+ more...

### Directionals
- NORTH → N
- SOUTH → S
- NORTHEAST → NE
- And all 8 directionals...

### Secondary Units
- SUITE → STE
- APARTMENT → APT
- FLOOR → FL
- BUILDING → BLDG
- And 20+ more...

### States
- CALIFORNIA → CA
- NEW YORK → NY
- All 50 states + territories

### Formatting Rules
- **ALL CAPS** for all address fields
- **Remove periods** (P.O. → PO)
- **Remove extra spaces**
- **Standardize abbreviations**

## Address Line Format

**Line 1 (Primary Address):**
- Format: `[Number] [PreDir] [Street Name] [Suffix] [PostDir]`
- Example: `128 N KING ST`

**Line 2 (Secondary Address):**
- Format: `[Unit Type] [Unit Number]`
- Example: `FL 3 STE 301`
- Empty string if no secondary addressing

**City, State, ZIP:**
- Individual fields: `city`, `state`, `zip_code`
- Combined: `city_state_zip` → `SAN FRANCISCO, CA 94107`

## API Reference

### `normalize_address(address: str) -> USPSAddress`

Main function to normalize an address.

**Parameters:**
- `address` (str): Raw address string in any format

**Returns:**
- `USPSAddress`: Normalized address object

**Raises:**
- `ImportError`: If libpostal is not installed

### `USPSAddress` Class

**Attributes:**
- `line1` (str): Primary delivery address
- `line2` (str): Secondary address (or empty string)
- `city` (str): City name in ALL CAPS
- `state` (str): Two-letter state code
- `zip_code` (str): ZIP code

**Properties:**
- `city_state_zip` (str): Combined city, state, ZIP
- `full_address` (str): All lines joined with newlines

**Methods:**
- `as_dict()`: Returns dictionary with all components
- `__str__()`: Returns full_address
- `__repr__()`: Developer-friendly representation

## Why Use This Library?

### The Problem with Manual Parsing

Without this library, you'd need:
- ❌ Complex regex patterns for each component
- ❌ Manual lookup tables for all abbreviations
- ❌ Logic to distinguish "FL" (Floor) from "FL" (Florida)
- ❌ Handling multi-word street names
- ❌ Context-aware parsing

### The Solution

✅ **libpostal provides:** Intelligent ML-based parsing (knows "FL 3" is a floor, not Florida)
✅ **This library provides:** Complete USPS normalization rules and formatting
✅ **You get:** Clean, standardized addresses ready for databases, mailings, or APIs

## Requirements

- Python >= 3.8
- libpostal C library (brew install libpostal)
- postal Python package (installed automatically)

## Development

### Building from Source

```bash
# Install development dependencies
pip install -e ".[dev]"

# Build wheel
python -m build

# Output: dist/usps_address_normalizer-1.0.0-py3-none-any.whl
```

### Running Tests

```bash
cd tests
python test_normalizer.py
```

## License

MIT License

## Credits

- Built on [libpostal](https://github.com/openvenues/libpostal) for intelligent address parsing
- USPS abbreviations from [USPS Publication 28](https://pe.usps.com/text/pub28/welcome.htm)

## Version

1.0.0
