Metadata-Version: 2.4
Name: us-water-quality-data
Version: 2026.4.26
Summary: U.S. water quality and home safety data by ZIP code — violations, lead/copper, radon, PFAS, flood risk, home values, remediation costs from 50+ federal sources
Author-email: Artem Akulov <artem@liraltd.com>
License-Expression: CC-BY-4.0
Project-URL: Homepage, https://zipcheckup.com
Project-URL: Repository, https://github.com/artakulov/us-water-quality-data
Project-URL: Issues, https://github.com/artakulov/us-water-quality-data/issues
Project-URL: Dataset, https://zipcheckup.com/data/
Project-URL: API, https://api.zipcheckup.com/v1/
Keywords: water-quality,epa,sdwis,zip-code,lead,copper,radon,violations,drinking-water,environmental-data,public-health,home-safety,usa,open-data
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# us-water-quality-data

U.S. water quality data by ZIP code, packaged for Python. Includes violation history, lead/copper levels, radon zone classification, and Home Safety Scores for 3,500+ ZIP codes, sourced from the EPA Safe Drinking Water Information System (SDWIS).

[![PyPI](https://img.shields.io/pypi/v/us-water-quality-data.svg)](https://pypi.org/project/us-water-quality-data/)
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)

## Install

```bash
pip install us-water-quality-data
```

## Quick Start

```python
import us_water_quality_data as water

# Lookup a specific ZIP code
record = water.lookup("10001")
print(record)
# {'zip': '10001', 'city': 'New York', 'state': 'NY',
#  'home_safety_score': 36, 'home_safety_grade': 'F',
#  'total_violations': 7, 'lead_level_mg_l': 0.01, ...}

# All ZIP codes in California
ca = water.get_state("CA")
print(f"{len(ca)} ZIP codes in CA")

# 10 worst scores in the country
worst = water.get_worst(10)
for z in worst:
    print(f"{z['zip']} {z['city']}, {z['state']}: {z['home_safety_score']}")

# 10 best scores
best = water.get_best(10)

# All states in the dataset
print(water.states())  # ['AK', 'AL', 'AR', ...]

# Total ZIP codes
print(water.count())  # 1990+

# Search by city
chicago = water.search_city("chicago")

# Dataset metadata
print(water.meta["updated"])        # '2026-03-17'
print(water.meta["total_zips"])     # 1990
print(water.meta["states_covered"]) # 51
```

## API Reference

### `lookup(zip_code: str) -> dict | None`

Lookup water quality data for a specific ZIP code. Zero-pads short codes automatically.

```python
water.lookup("10001")   # dict
water.lookup("00000")   # None
water.lookup("6001")    # same as "06001"
```

### `get_state(state: str) -> list[dict]`

Get all ZIP records for a given state. Case-insensitive.

```python
water.get_state("CA")   # all California ZIPs
water.get_state("ny")   # works too
```

### `get_worst(n: int = 10) -> list[dict]`

Get the ZIP codes with the worst (lowest) Home Safety Scores, sorted ascending.

### `get_best(n: int = 10) -> list[dict]`

Get the ZIP codes with the best (highest) Home Safety Scores, sorted descending.

### `states() -> list[str]`

Get a sorted list of all unique 2-letter state abbreviations in the dataset.

### `count() -> int`

Get the total number of ZIP codes in the dataset.

### `zips() -> list[str]`

Get all ZIP codes in the dataset as a list of strings.

### `search_city(city: str) -> list[dict]`

Search ZIP codes by city name (case-insensitive partial match).

```python
water.search_city("chicago")     # all Chicago ZIPs
water.search_city("san fran")    # partial match works
```

### `meta`

Dataset metadata as a dict-like object.

```python
water.meta["name"]            # 'ZipCheckup U.S. Water Quality Dataset'
water.meta["license"]         # 'CC-BY-4.0'
water.meta["source"]          # 'U.S. EPA Safe Drinking Water Information System (SDWIS)'
water.meta["updated"]         # '2026-03-17'
water.meta["total_zips"]      # number of ZIPs
water.meta["states_covered"]  # number of states
water.meta["fields"]          # dict of field name -> description
```

## Data Fields

| Field | Type | Description |
|-------|------|-------------|
| `zip` | str | 5-digit U.S. ZIP code |
| `city` | str | City name |
| `state` | str | 2-letter state abbreviation |
| `home_safety_score` | int\|None | Composite score 0-100 |
| `home_safety_grade` | str | Letter grade: A / B / C / D / F |
| `total_violations` | int | Total violations in past 5 years |
| `health_violations` | int | Health-based violations in past 5 years |
| `unresolved_violations` | int | Currently unresolved violations |
| `contaminant_count` | int | Distinct health-based contaminants |
| `health_contaminant_names` | str | Semicolon-separated contaminant names |
| `lead_level_mg_l` | float\|None | 90th percentile lead level (mg/L) |
| `copper_level_mg_l` | float\|None | 90th percentile copper level (mg/L) |
| `radon_zone` | int\|None | EPA radon zone: 1 (highest) to 3 (lowest) |
| `water_source` | str | `SW` = Surface Water, `GW` = Groundwater |
| `system_name` | str | Primary water system name |
| `pwsid` | str | EPA Public Water System ID |
| `population` | int\|None | Population served |
| `latitude` | float | ZIP centroid latitude |
| `longitude` | float | ZIP centroid longitude |

## Coverage

- **ZIP codes:** 3,500+ (growing with each release)
- **States:** All 50 U.S. states + D.C.
- **Violation window:** Rolling 5 years
- **Update frequency:** New versions published with each dataset refresh

## Data Source

All data is derived from the [EPA Safe Drinking Water Information System (SDWIS)](https://www.epa.gov/enviro/sdwis-overview). Lead and copper levels come from EPA Lead and Copper Rule (LCR) sampling. Radon zones are county-level EPA classifications.

**Home Safety Score** is a composite 0-100 score that penalizes health-based violations, unresolved violations, lead exceedances, and contaminant count. Methodology: [zipcheckup.com/about/home-safety-score/](https://zipcheckup.com/about/home-safety-score/)

## Also Available

- **npm package:** [`us-water-quality-data`](https://www.npmjs.com/package/us-water-quality-data) (Node.js / TypeScript)
- **Live site:** [zipcheckup.com](https://zipcheckup.com) -- free water quality reports by ZIP
- **Open dataset (CSV/JSON):** [zipcheckup.com/data/](https://zipcheckup.com/data/)
- **API:** [api.zipcheckup.com/v1/](https://api.zipcheckup.com/v1/)

## License

Data: [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). Code: MIT.

> Data by [ZipCheckup.com](https://zipcheckup.com) -- sourced from EPA SDWIS.
