Tutorial 2: ACS 5-Year Aggregate Data¶
This tutorial covers the most common use case: fetching aggregate statistics from the ACS 5-Year estimates.
Goal: Get poverty statistics for all places (cities, towns, CDPs) in a state.
Setup¶
In [ ]:
Copied!
import os
from cendat import CenDatHelper
from dotenv import load_dotenv
load_dotenv()
cdh = CenDatHelper(key=os.getenv("CENSUS_API_KEY"))
import os
from cendat import CenDatHelper
from dotenv import load_dotenv
load_dotenv()
cdh = CenDatHelper(key=os.getenv("CENSUS_API_KEY"))
Step 1: Find the ACS 5-Year Product¶
In [ ]:
Copied!
# The \) at the end matches products ending with a closing paren,
# which filters out sub-products like /profile, /subject, etc.
cdh.list_products(years=[2023], patterns=r"acs/acs5\)")
cdh.set_products()
# The \) at the end matches products ending with a closing paren,
# which filters out sub-products like /profile, /subject, etc.
cdh.list_products(years=[2023], patterns=r"acs/acs5\)")
cdh.set_products()
Step 2: Explore Variable Groups¶
For products like ACS with thousands of variables, groups are essential:
In [ ]:
Copied!
# Search for poverty-related groups
cdh.list_groups(patterns="poverty")
# Search for poverty-related groups
cdh.list_groups(patterns="poverty")
In [ ]:
Copied!
# Let's use B17001 (Poverty Status by Sex by Age)
cdh.set_groups("B17001")
# See what variables are in this group
cdh.describe_groups()
# Let's use B17001 (Poverty Status by Sex by Age)
cdh.set_groups("B17001")
# See what variables are in this group
cdh.describe_groups()
Step 3: Select Variables and Geography¶
In [ ]:
Copied!
# B17001_001E = Total population for poverty calculation
# B17001_002E = Population below poverty level
cdh.set_variables(["B17001_001E", "B17001_002E"])
# 160 = Places (cities, towns, CDPs)
cdh.set_geos(["160"])
# B17001_001E = Total population for poverty calculation
# B17001_002E = Population below poverty level
cdh.set_variables(["B17001_001E", "B17001_002E"])
# 160 = Places (cities, towns, CDPs)
cdh.set_geos(["160"])
Step 4: Get Data with Names¶
In [ ]:
Copied!
response = cdh.get_data(
include_names=True, # Include place names
include_attributes=True # Include margins of error
)
response = cdh.get_data(
include_names=True, # Include place names
include_attributes=True # Include margins of error
)
Step 5: Analyze¶
In [ ]:
Copied!
# Convert to DataFrame
df = response.to_polars(concat=True, destring=True)
df.glimpse()
# Convert to DataFrame
df = response.to_polars(concat=True, destring=True)
df.glimpse()
In [ ]:
Copied!
# Quick tabulation: how many places have >10,000 population?
response.tabulate("state", where="B17001_001E > 10_000")
# Quick tabulation: how many places have >10,000 population?
response.tabulate("state", where="B17001_001E > 10_000")
In [ ]:
Copied!
# Weighted by population
response.tabulate(
"state",
weight_var="B17001_001E",
where="B17001_001E > 10_000"
)
# Weighted by population
response.tabulate(
"state",
weight_var="B17001_001E",
where="B17001_001E > 10_000"
)