Metadata-Version: 2.4
Name: pycen
Version: 0.1.0a3
Summary: Lightweight Python package for intuitively exploring and acquiring U.S. Census data with spatial integration
Author: pycen contributors
License-Expression: MIT
Keywords: census,acs,demographics,gis,spatial
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: GIS
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas<3.0,>=1.5
Requires-Dist: requests<3.0,>=2.28
Requires-Dist: us<4.0,>=2.0
Requires-Dist: tqdm<5.0,>=4.64
Requires-Dist: geopandas<1.0,>=0.14
Requires-Dist: pygris<0.3,>=0.1.6
Requires-Dist: shapely<3.0,>=2.0
Requires-Dist: matplotlib<4.0,>=3.5
Requires-Dist: mapclassify<3.0,>=2.4
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Dynamic: license-file

# pycen

Lightweight Python package for exploring and acquiring U.S. Census data with intuitive spatial integration. <br><br>

```mermaid
flowchart TD
    A[Need Census data?]

    A --> B & C

    subgraph PYCEN["<i>pycen</i>"]
        direction TB
        B[<b>`explore`</b><br/>Intuitive metadata<br/>keyword search]
        C[<b>`acquire`</b><br/>Data + boundaries<br/>in one call]

        C --> D
        C --> E

        D[<b>`quick_check`</b><br/>Quality validation]
        E[<b>`quick_viz`</b><br/>Instant maps]
    end

    B --> F
    D & E --> F[Domain analysis]

    style A fill:#94a3b8,stroke:#334155,stroke-width:2px,color:#000
    style B fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    style C fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    style D fill:#22c55e,stroke:#15803d,stroke-width:2px,color:#fff
    style E fill:#22c55e,stroke:#15803d,stroke-width:2px,color:#fff
    style F fill:#94a3b8,stroke:#334155,stroke-width:2px,color:#000
    style PYCEN fill:#1e293b,stroke:#64748b,stroke-width:2px,color:#fff
```
## overview

`pycen` makes the exploration and acquisition of U.S. Census data accessible and intuitive for spatial workflows. The `explore` module presents browsable Census API metadata via topic-organized, interactive nested tables, with customizable themes to highlight curated variable recipes. It also supports natural‑language keyword searches for efficient variable discovery. The `acquire` module streamlines data processing: one function call returns both data and boundaries as a GeoDataFrame with built-in quality checks and rapid visualizations;simple tabular or boundaries-only downloads are separately callable. `pycen` pulls live data products with efficient local caches to keep iterations fast, smooth, and reproducible. The multi‑year fetch function enables longitudinal comparisons tracking change over time.

## sample use

### basic workflow

```python
import pycen
from pycen import explore, acquire

# 1. Explore variables
# `browse` and `search` return interactive tables
# `lookup` returns details
explore.browse(year=2023, dataset="acs5").show()
explore.search("vehicle", year=2023, dataset="acs5").show()
explore.lookup("B08201_002E", year=2021, dataset="acs5")

# 2. Acquire data
## continental US income gini map
gdf = acquire.get_censhp(
    variables={"B19083_001E":"gini_index"},
    geography="place",               # if no state/county, gets nationwide
    dataset="acs5",
    year=2023,
)
acquire.quick_check(gdf)             # returns N/A summary
acquire.quick_viz(gdf, "gini_index") # returns map + distribution histogram
acquire.quick_viz(gdf, "gini_index", palette="viridis") # optional customizable palette
acquire.quick_viz(gdf, "gini_index",save_path='gini_index.png') # optional save

## finer scale
## Cook County income gini at tract level
gdf = acquire.get_censhp(
    variables={"B19083_001E":"gini_index"},
    geography="tract",
    county="Cook County County",
    state="IL",
    dataset="acs5",
    year=2023,
)
acquire.quick_viz(gdf, "gini_index")

## neighborhood analyses
## Chicago super commuters
gdf = acquire.get_censhp(
    variables={"B08303_012E":"commute_over_60min", "B08303_001E":"total_commuters"},
    geography="block group",
    place="Chicago city",
    #county="Cook County",  # optional, add for clarity
    state="IL",
    dataset="acs5",
    year=2023
)
gdf["pct_super_commuters"] = gdf["commute_over_60min"] / gdf["total_commuters"] * 100
acquire.quick_viz(gdf, "pct_super_commuters")

## decennial data supports block-scale (finest)
## Chicago housing vacancy rates at block level
select_var={
    "H001003": "vacant_hh",
    "H001001": "total_hh"
}
gdf = acquire.get_censhp(
    variables=select_var,
    geography="block",
    county="Cook County",
    state="IL",
    dataset="dec_pl",
    year=2010,
)
gdf['vacancy_rate'] = gdf['vacant_hh'] / gdf['total_hh'] * 100
acquire.quick_viz(gdf, "vacancy_rate")
```

### tabular data workflow

```python
# 3. Tabular data only
df = acquire.get_census(
    variables=["B25032_022E"],  # renter-occupied, mobile home
    geography="tract",
    state="CA",
    year=2021,
)

# 4. Single-year, multivariable tabular data for comparative analysis
import pandas as pd
import matplotlib.pyplot as plt
from pycen import acquire

vars_race = {
    'B03002_001E': 'total',
    'B03002_003E': 'nh_white',
    'B03002_004E': 'nh_black',
    'B03002_006E': 'nh_asian',
    'B03002_005E': 'nh_aian',
    'B03002_007E': 'nh_nhpi',
    'B03002_008E': 'nh_other',
    'B03002_009E': 'nh_two_or_more',
    'B03002_012E': 'hispanic',
}

df_race = acquire.get_census(
    variables=vars_race,
    geography='county',
    state='CA',
    county='Alameda',
    dataset='acs5',
    year=2023,
)

row = df_race.iloc[0]
other = row['nh_aian'] + row['nh_nhpi'] + row['nh_other'] + row['nh_two_or_more']
vals = {
    'White (NH)': row['nh_white'],
    'Black (NH)': row['nh_black'],
    'Asian (NH)': row['nh_asian'],
    'Other (NH)': other,
    'Hispanic (any race)': row['hispanic'],
}

pct = {k: v / row['total'] * 100 for k, v in vals.items()}

plt.figure(figsize=(7, 4))
plt.bar(pct.keys(), pct.values(), color=['#4c78a8', '#f58518', '#54a24b', '#b279a2', '#e45756'])
plt.ylabel('Population %')
plt.title('Alameda County, CA -— Race/Ethnicity (ACS 2023)')
plt.xticks(rotation=25, ha='right')
plt.tight_layout()
plt.show()

# 5. Multi-year tabular data for trend analysis
# comparative tracking of remote work surge (2019–2023)
from pycen import acquire
import matplotlib.pyplot as plt

# explore.search("work from home", year=2023, dataset="acs5").show()
# B08101_049E = worked from home
df_long = acquire.get_census(
    variables={'B08101_049E': 'wfh_workers', 'B08101_001E': 'total_workers'},
    geography='county',
    state='CA',
    years=[2019, 2020, 2021, 2022, 2023],
    merge='long'
)

df_long['wfh_pct'] = (df_long['wfh_workers'] / df_long['total_workers']) * 100
bay_area = df_long[df_long['NAME'].str.contains('San Francisco|Alameda|Santa Clara|Contra Costa|San Mateo')]

for county in bay_area['NAME'].unique():
    county_data = bay_area[bay_area['NAME'] == county]
    plt.plot(county_data['year'], county_data['wfh_pct'], marker='o', label=county)

plt.title('Bay Area WFH 2019-2023')
plt.ylabel('Work From Home (%)')
plt.xlabel('Year')
plt.xticks(sorted(major['year'].unique())) 
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
```

## core functions

Explore
- `explore.search(query, year, dataset)` - supports exact term match and fuzzy keyword search
- `explore.browse(year, dataset)` - view all variables via interactive tree table with theme variable highlights
- `explore.lookup(code, year, dataset)` - inspect variable details

Acquire
- `acquire.get_censhp(...)` - data + boundaries --> GeoDataFrame
- `acquire.get_census(...)` - data only --> DataFrame
- `acquire.get_boundaries(...)` - boundaries only --> shp/gpkg
- `acquire.quick_check(gdf)` - N/A values summary
- `acquire.quick_viz(gdf, column, palette, save_path)` - exploratory map + distribution histogram for select variable

Info
- `pycen.get_product()` - list datasets and years
- `pycen.get_geography()` - list geography levels by dataset

Geo Helpers
```
from pycen import geography
geography.search('Oakland', state='CA') # most powerful, return all related info

# state and county lookup
geography.state('CA') # can also search by 'California' or fips code '06'
geography.county('Alameda', state='CA')

# list geographies
geography.list_places('CA', query='Oakland') # minimal search
pycen.geography.list_cbsa(query='new york',year=2023, limit=5) # specify year and return limit if multi-match
pycen.geography.list_csa(query='detroit',year=2023, limit=5) # look up csa name
geography.list_counties('CA')
```

Themes
- `pycen.set_theme(name_or_dict)` - set active theme name or register a custom theme (dict)
- `pycen.get_theme_settings()` - get active theme name (defaults to a general curation of useful variables)
- `pycen.explore.get_theme(name=None)` - get theme details (dict); defaults to active theme
- `pycen.list_themes()` - list available theme names (includes session custom themes)

## Notes

- Datasets: `acs5`, `acs1`, `dec_pl`, `dec_sf1`
- Spatial features require: `geopandas`, `pygris`
- Geographies are resolved per dataset/year from Census geography metadata (live/cache/static)
- Optional: `rich` enables prettier terminal tables for `explore.search().show()`
- `geography.search()` uses a bundled 2020 snapshot by default; if a different vintage is requested, it attempts a live code-list fetch and falls back to 2020 if unavailable

API key for higher rate limits:
```python
pycen.set_api_key("YOUR_KEY")  # get key at api.census.gov/data/key_signup.html
```
