Metadata-Version: 2.4
Name: livelike
Version: 1.5.1
Summary: A Population Synthesizer for High Demographic Resolution Analysis.
Author-email: "Jospeh V. Tuccillo" <tuccillojv@ornl.gov>, "James D. Gaboardi" <gaboardijd@ornl.gov>
Maintainer: Jospeh V. Tuccillo, James D. Gaboardi
Project-URL: Home, https://github.com/likeness-pop
Project-URL: Repository, https://github.com/likeness-pop/livelike
Keywords: population-synthesis,high-demographic-resolution
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: GIS
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: awkward>=2.6
Requires-Dist: certifi>=2025.10.5
Requires-Dist: deprecation
Requires-Dist: dill
Requires-Dist: geopandas>=1.1
Requires-Dist: libpysal>=4.12
Requires-Dist: multiprocess>=0.70
Requires-Dist: networkx>=3.2
Requires-Dist: numpy>=1.26
Requires-Dist: pandas>=2.2
Requires-Dist: pyarrow
Requires-Dist: pygris>=0.1.6
Requires-Dist: scipy>=1.12
Requires-Dist: shapely>=2.0
Requires-Dist: likeness_vitals
Provides-Extra: jax-cpu
Requires-Dist: jax>=0.5.3; extra == "jax-cpu"
Requires-Dist: jaxlib>=0.5.3; extra == "jax-cpu"
Requires-Dist: jaxopt>=0.8.3; extra == "jax-cpu"
Provides-Extra: jax-gpu
Requires-Dist: jax[cuda12]<=0.4.31; extra == "jax-gpu"
Requires-Dist: jaxlib[cuda12]<=0.4.31; extra == "jax-gpu"
Requires-Dist: jaxopt>=0.8.3; extra == "jax-gpu"
Requires-Dist: cuda-nvcc; extra == "jax-gpu"
Requires-Dist: cudatoolkit; extra == "jax-gpu"
Provides-Extra: tests
Requires-Dist: pre-commit; extra == "tests"
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-cov; extra == "tests"
Requires-Dist: pytest-xdist; extra == "tests"
Requires-Dist: ruff; extra == "tests"
Requires-Dist: setuptools_scm; extra == "tests"
Requires-Dist: watermark; extra == "tests"
Provides-Extra: notebooks
Requires-Dist: ipywidgets; extra == "notebooks"
Requires-Dist: jupyterlab; extra == "notebooks"
Requires-Dist: pymedm; extra == "notebooks"
Provides-Extra: all-cpu
Requires-Dist: livelike[jax_cpu,notebooks,tests]; extra == "all-cpu"
Provides-Extra: all-gpu
Requires-Dist: livelike[jax_gpu,notebooks,tests]; extra == "all-gpu"
Dynamic: license-file

# Livelike: Vivid Synthetic Populations

![tag](https://img.shields.io/github/v/release/likeness-pop/livelike?include_prereleases&sort=semver)
[![Continuous Integration](https://github.com/likeness-pop/livelike/actions/workflows/continuous_integration.yml/badge.svg)](https://github.com/likeness-pop/livelike/actions/workflows/continuous_integration.yml)
[![codecov](https://codecov.io/gh/likeness-pop/livelike/branch/develop/graph/badge.svg?token=KTFJ10C1S3)](https://codecov.io/gh/likeness-pop/livelike)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

This package provides a high-level wrapper for generating synthetic populations via Census APIs based on the American Community Survey (ACS) 5-Year Estimates. Synthetic populations are virtual representations of people and households produced for small census areas (block groups, tracts) and can be attributed by a variety of demographic, economic, social, worker, student, mobility, housing, health, and communication characteristics found in the ACS. 

## Specifying a P-MEDM Problem

Synthetic populations are generated by allocating records from the ACS Public Use Microdata Sample (PUMS) from their native spatial resolution of Public-Use Microdata Areas (100,000+ people) to small census areas (typically <8000 people) such that the aggregate characteristics of people and households align closely with population profiles of the small census areas available in the ACS Summary File (SF). This is accomplished using [Penalized Maximum-Entropy Dasymetric Modeling (P-MEDM)](https://www.tandfonline.com/doi/abs/10.1080/00045608.2013.843439), which seeks to recreate the error variances on each small-area variable estimate in the ACS SF. LiveLike makes it simple to design and solve P-MEDM problems by fetching all of the necessary P-MEDM inputs for a given PUMA via Census APIs. 

The bulk of P-MEDM setup is handled automatically by the `acs` module via the [Census Microdata API](https://census.gov/data/developers/data-sets/census-microdata-api.html).

In a basic use-case, inputs are simply:

1. The 2010 or 2020 PUMA ID (`<State FIPS> + <PUMA FIPS>`, as shown [here](https://usa.ipums.org/usa/volii/2010PUMAS.shtml)
2. A [Census API key](https://api.census.gov/data/key_signup.html) (optional).

Examples are provided in the `notebooks` directory. 

### Supported Geographies

P-MEDM requires a target geography and an aggregate geography to account for error variances. The selected target geography determines the aggregate geography:

| Level | Code | Population (approx.) | Aggregate |
|-------|------------|-------------|-------|
| **Block group** | **`bg`** | **600 - 3000** | **Tract** |
| Tract | `trt` | 1200 - 8000 | Supertract  |

LiveLike handles tracts, which have no sub-county aggregation level, using a regionalization approach to generate custom "supertracts" (see `notebooks/tract_supertract_2019.ipynb` for an example).

### Supported ACS Years

The ACS 5-Year Estimates are a rolling 5% sample of the United States population weighted to be representative of the release year (vintage), with additional adjustments for factors like income. LiveLike uses the ACS 2019 5-Year Estimates as its default vintage.  

| Year | Vintage | Available | 
|------|---------|-----------|
| 2016 | ACS 2012 - 2016 5-Year Estimates | :white_check_mark:
| 2017 | ACS 2013 - 2017 5-Year Estimates | :white_check_mark:
| 2018 | ACS 2014 - 2018 5-Year Estimates | :white_check_mark:
| **2019** | **ACS 2015 - 2019 5-Year Estimates** | :white_check_mark:
| 2020 | ACS 2016 - 2020 5-Year Estimates | :x:
| 2021 | ACS 2017 - 2021 5-Year Estimates | :x:
| 2022 | ACS 2018 - 2022 5-Year Estimates | :x:
| 2023 | ACS 2019 - 2023 5-Year Estimates | :white_check_mark:

Currently, years between 2016 and 2019 and 2023 are supported. The gap between 2020 - 2022 is due to mixed geography problems that P-MEDM cannot directly handle (2010 PUMAs with 2020 small areas for 2020, 2021; mixture of 2010/2020 PUMAs with 2020 small areas for 2022).

### P-MEDM Constraints

P-MEDM constraints are sets of residential and population characteristics common between the ACS SF and PUMS that can be used to design a P-MEDM model and attribute the synthetic population. LiveLike provides several configurations of prebuilt constraints:

- **Base (default)**: Baseline modeling constraints representing population totals, routine daily activities (workers, students), and mobility characteristics, available in `config.up_base_constraints_selection`. 
- **Expanded**: Baseline modeling constraints with a selection of demographic, social, economic, and housing characteristics, available in `config.up_expanded_constraints_selection`. The Base constraints can be overwritten by the Expanded ones using:

    ```{python}
    from config import up_expanded_constraints_selection

    acs.puma(..., constraints_selection=up_expanded_constraints_selection)
    ```

Several additional constraint themes (health, communications) are available outside the prebuilt configurations and can be added onto a custom constraints selection.

| Theme       | Description                                                                                                                                       | Base | Expanded | Notes                                             |
|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------|------|----------|---------------------------------------------------|
| universe    | Sampling universe totals (population, civilian noninstituionalized population, group quarters population, housing units, occupied housing units). |   x  |     x    |                                                   |
| worker      | Worker characteristics (employment, class of worker, industry, occupation, hours worked per week).                                                |   x  |     x    |                                                   |
| student     | Student characteristics (grade level attending, public/private school).                                                                           |   x  |     x    |                                                   |
| mobility    | Mobility characteristics (commute time/mode, vehicles available).                                                                                 |   x  |     x    |                                                   |
| demographic | Basic demographics (sex, age) and living arrangement characteristics.                                                                             |      |     x    | Expanded: Sex by age and household type only                |
| social      | Social characteristics (race/ethnicity, language, place of birth, veteran status).                                                                                |      |     x    | Expanded: Race/ethnicity only                               |
| economic    | Economic characteristics (household income, poverty, educational attainment).                                                                                             |      |     x    | Expanded: Household income and income to poverty ratio only |
| housing     | Housing characteristics (tenure, dwelling type, year built, number of rooms, house heating fuel). | | x | Expanded: Dwelling type and year built only
| health     | Health insurance coverage type.  
| communications     | Household internet access.                                                                                         |      |         |                  |

#### Custom Constraint Selection

Constraint selections are passed to `acs.puma(constraint_selection=...)` as a `dict` with keys representing ACS variable themes and values representing specific subjects (tables). If the value passed is a `bool` type, a `True` value will include variables for all subjects in the theme, while a `False` value will bypass that theme (the same as omitting the theme from the selection). If the value passed is a `list` type, only listed subjects will be included in the result.

**Example:**

```{python}
custom_constraints_selection = {
    "universe" : True,
    "worker" : True,
    "student" : True,
    "mobility" : True,
    "demographic" : [
        "sex_age",
        "hhtype",
    ],
    "economic" : [
        "hhinc",
        "ipr",
    ],
    "health" : True,
    "communications" : True,
}
```

- Use all variables listed under the `universe`, `worker`, `student`, and `mobility`, `health`, and `communications` themes.
- Use only household income (`hhinc`) and income to poverty ratio (`ipr`) from the `economic` theme.

#### The Constraints File

The constraints file (`livelike/data/constraints.csv`) underlies the constraint selection process, describing relationships between available PUMS variables, P-MEDM constraints, and ACS Summary File (SF) variables, as well as year of availability for constraints. It is used to generate individual-level representations of ACS SF tables/variables based on PUMS data.  

- `level`: PUMS file level (`person` or `household`).
- `geo_base_level`: Baseline geography for which the constraint is available (`bg`: block group; `trt`: tract).
- `theme` : Constraint topics/themes. Each theme points to a PUMS/SF crosswalking function in `livelike.pums`.
- `subject`: The subject of the ACS SF table to be represented at the individual level using PUMS data. This column references the function in the `pums` module used to produce a P-MEDM constraint.
- `constraint`: P-MEDM constraining variable name.
- `pums[1...n]`: Multiple columns the PUMS variables associated with each P-MEDM constraint table. These are parsed using a regex search for any columns in the file beginning with `pums`. 
- `code`: ACS SF variable codes matching each P-MEDM constraint. 
- `desc`: P-MEDM constraining variable longform description.
- `begin_year`: the initial year in which the constraint was availble.
- `end_year`: the final year in which the constraint was available.


### Census API Key

Using a Census API Key is optional but is recommended to avoid hitting request limits. 

1. Register for a [Census API Key](https://api.census.gov/data/key_signup.html).
2. Activate your key via the confirmation email link you receive. 
3. In the top directory of `livelike`, run:

```
echo YOUR_CENSUS_API_KEY > censusapikey.txt
```

The file that is created, `censusapikey.txt`, is not tracked by `git`. This ensures that your personal API key is never exposed on a remote branch.

## Population Synthesis

Utilities for population synthesis can be found in the `homesim` module. Our current approach is to sample from the P-MEDM allocation matrix ($i...n$ PUMS records by $j...m$ areas) for a given area based on family status/household size, group quarters, and vacant housing, such that the area's total population is approximately preserved.

## Batch operations

The `multi` module provides utilities for population synthesis across multiple PUMAs, including: 

- Making PUMA instances across multiple geographies or replicates (alternative PUMS weights)
- Population synthesis
- Querying and extracting PUMS descriptors from Census Microdata API


## Testing

### Rebuilding Test Data

The scripts to rebuild test data are stored in the `utilities` directory. Execute them from the main directory, for example:

```
python utilities/prep_test_build_puma.py
python utilities/prep_test_notebook_solutions.py
```

### Running Testing Suite Locally

To run the testing suite locally, enter:

```
bash run_tests.sh
```

## Rough edges

#### Constraint order matters

The default P-MEDM solver, `pymedm`, gives different solutions when constraint order varies. This seems to be tied to floating point underflow errors in `jax`, a core dependency of `pymedm`, that seem to be caused by differing positions of the model input variables. LiveLike for both prebuilt and custom constraints, implementing a method in the `puma` constructor to consistently sort constraints by `theme` and `code`.

#### Negative replicate weights

In rare cases, the values of PUMS replicate household weights can be negative. For compatibility with P-MEDM, we zero out these negative values. See [this thread](https://forum.ipums.org/t/what-do-negative-replicate-weights-mean/2519) for further details. 

The P-MEDM `population` constraint is approximated as a sum of the ratio of each household member's person weight (`PWGTP`) to the head of household's weight (which itself roughly matches the household weight). When the head of household's replicate person weight is less than one, we use a placeholder value of 1 so that each additional household member still contributes to the `population` constraint for the household. We welcome community contributions for more robust improvements to this approach.
