g
govdata.ai // powered by substrate
Get started
21 datasets live · 14 federal agencies · open kernel

US public data,
accessible to your agent.

Give Claude Code research-grade access to federal data — Census, BLS, FHFA, BEA and more — with full provenance, pinned vintages, and reproducible queries. No scraping. No hallucinations.

$ curl -s https://govdata.ai/install | bash

21
datasets live on R2
14
federal agencies covered
~61MB
total bundle size
100%
queries with provenance
How it works

Three lines, one bundle, every agency.

Pull a pre-materialized cross-agency bundle, expose it to Claude over MCP, ask questions in plain English.

01

Install

Open kernel. MIT licensed. No account required for local use.

# add to your project
$ pip install substrate
02

Pull a dataset

Versioned, sha256-verified DuckDB file. ~2 MB for state_panel.

$ substrate pull state_panel
# → state_panel.duckdb
# 5 tables · 419 rows · signed manifest
03

Serve to Claude

MCP over stdio. Add one entry to your Claude config.

$ substrate serve state_panel.duckdb
# MCP tools exposed:
# list_tables, describe_table,
# query_table, aggregate, run_sql
In Claude Code
you →
Which 5 states have the highest median household income, and how does that line up with employment-to-population ratios?
claude →
[calls govdata.run_sql]
The top 5 by median household income are DC ($108k), Massachusetts ($99.8k), New Jersey ($99.8k), Maryland ($98.7k), and Hawaii ($95.3k). DC also leads employment-to-population at 77.4% — driven by commuters from MD/VA. Massachusetts is the only state to be in the top 5 on both income and employment ratio. [6 rows · provenance: state_panel/lifecycle-test]
Real rows from state_panel
v: lifecycle-test
State Population Median income HPI
California38,965,193$96,334880.4
Texas30,503,301$76,958505.7
Florida22,610,726$71,711773.6
New York19,571,216$84,578977.3
Pennsylvania12,961,683$76,081585.8
Massachusetts7,001,399$99,8581149.3
Census ACS 1-year 2023 · BLS QCEW · FHFA HPI · BEA · single SQL join
Datasets

21 datasets live. ~61 MB. Pull, query, join.

Curated cross-agency bundles for instant insight, plus raw single-source bundles so you can build your own. Every bundle uses canonical join keys — state_fips, year, naics — so they ATTACH and JOIN without translation logic.

live index.json →
Curated
cross-agency, opinionated joins, ready to query
LIVE 2 MB

state_panel

Census + BLS + FHFA + BEA at state grain. Population, income, employment, home prices, per-cap PI.

52 rows 4 agencies 2023
$ substrate pull state_panel
LIVE 7 MB

county_profile

Census ACS + CDC PLACES (6 health measures) + Opportunity Insights mobility + USDA food environment, joined at county FIPS.

3,222 rows 4 agencies 21 fields
$ substrate pull county_profile
LIVE 2 MB

metro_affordability

FHFA HPI + Census ACS income + HUD Fair Market Rents + IRS migration at CBSA grain. 410 metros, with computed price-to-income and rent-burden ratios.

410 rows 4 agencies 2023
$ substrate pull metro_affordability
LIVE 5 MB

county_health

CDC PLACES — 13 chronic-disease prevalence measures pivoted into one wide table at county FIPS. Obesity, diabetes, heart disease, mental health, COPD, depression and more.

3,143 counties CDC · 13 measures age-adjusted
$ substrate pull county_health
LIVE <1 MB

population_panel

Census ACS 1-Year, state × year (2019, 2021–2023). Population, median household income, and poverty rate. Useful for tracking pandemic-era state trajectories.

208 rows Census 2019–2023
$ substrate pull population_panel
LIVE 2 MB

energy_burden

DOE LEAD Tool — energy burden (% of household income spent on energy) by county and income bracket across CA, TX, NY, FL, MA. Useful for finding communities where energy costs are a material share of income.

1,364 rows DOE 5 states × 3 brackets
$ substrate pull energy_burden
LIVE 1 MB

hmda_lending

Mortgage lending activity aggregated by state and year from HMDA. Counts and dollar volumes for originated, approved, and denied applications. Tax year 2022.

204 rows HMDA 2022
$ substrate pull hmda_lending
LIVE 2 MB

rent_trends

Zillow Observed Rent Index (ZORI) by metro × year, paired with the Wharton Residential Land Use Regulation Index (WRLURI). Lets you see rent growth alongside the regulatory friction explaining it.

5,065 rows Zillow + Wharton metro × year
$ substrate pull rent_trends
Raw
faithful to source, with canonical join columns added
LIVE 1 MB

bls_qcew_state

Raw BLS QCEW: state employment, wages, establishments. Native columns preserved + canonical state_fips/year/naics.

51 rows BLS canonical ✓
$ substrate pull bls_qcew_state
LIVE 1 MB

census_acs1_state

Raw Census ACS 1-year: population, median income, home value, rent, poverty, labor force. 8 variables, all states.

52 rows Census canonical ✓
$ substrate pull census_acs1_state
LIVE 1 MB

fhfa_hpi_state

Raw FHFA House Price Index, state × quarter for 2023. All-transactions index from FHFA.

204 rows FHFA canonical ✓
$ substrate pull fhfa_hpi_state
LIVE 1 MB

bea_personal_income_state

Raw BEA Regional CAINC1: state-level personal income for 2023. 51 state rows, native + canonical columns.

51 rows BEA canonical ✓
$ substrate pull bea_personal_income_state
LIVE 1 MB

eia_retail_electricity_state

Raw EIA retail electricity prices, state × period × sector. Time series back to 2001. Useful for energy-cost analysis.

4,124 rows EIA canonical ✓
$ substrate pull eia_retail_electricity_state
LIVE 14 MB

fred_macro_indicators

Raw FRED: 24 headline US macro series back to 2000 — UNRATE, CPI, GDP, mortgage rates, jobless claims, MSPUS, fed funds, treasuries.

31k rows FRED · 24 series 2000–now
$ substrate pull fred_macro_indicators
LIVE 2 MB

eia_generation_state

Raw EIA electricity generation, state × source × sector × period back to 2001. Useful for tracking generation-mix shifts.

EIA 2001–now canonical ✓
$ substrate pull eia_generation_state
LIVE 1 MB

cdc_places_diabetes

Raw CDC PLACES, age-adjusted diabetes prevalence at county grain. Standalone single-measure bundle for à-la-carte querying.

3,143 counties CDC canonical ✓
$ substrate pull cdc_places_diabetes
LIVE 3 MB

fema_disasters_state

Raw FEMA federal disaster declarations across 50 states + DC. One row per (disaster, state, county) with incident type, dates, and which assistance programs were activated.

37k rows FEMA canonical ✓
$ substrate pull fema_disasters_state
LIVE 9 MB

irs_county_migration

Raw IRS SOI county-to-county migration flows for tax year 2022. One row per (origin_county, dest_county, direction). Inflow/outflow taxpayers, exemptions, and AGI.

69k rows IRS canonical ✓
$ substrate pull irs_county_migration
LIVE 1 MB

usaspending_state

Raw USA Spending federal awards by state for FY2024. Total $ awarded, population, per-capita. Useful for federal $ flow analysis.

57 rows USAspending canonical ✓
$ substrate pull usaspending_state
LIVE 1 MB

usda_food_environment_county

Raw USDA Food Environment Atlas: county-level food access, food insecurity, grocery vs fast-food density, SNAP participation, obesity + diabetes.

~3,100 counties USDA canonical ✓
$ substrate pull usda_food_environment_county
LIVE 1 MB

cms_medicare_enrollment_state

Raw CMS Medicare enrollment, state × month panel for 2023. Total beneficiaries split between original Medicare and Medicare Advantage.

~750 rows CMS canonical ✓
$ substrate pull cms_medicare_enrollment_state
Every dataset shipped here passed substrate doctor. Vote on what's next →
Provenance

Every row traces back to its source.

Other "stock-market-data-for-AI" services hand you a number. govdata hands you a number plus the connector that pulled it, the timestamp, the params, the SQL hash that joined it, and the cost of any LLM extraction. Reproducible by construction.

Every published bundle ships with a manifest recording per-source vintages and the assertion results that gated the publish. Pin a version, re-run a query in 2030, get the same number.

Per-row provenance columns on every sync, derive, and llm_extract
Manifest pins per-source vintages and full sync params
sha256-verified bundles — corruption fails the pull
Quality gates (asserts) block publish on regression
Manifest excerpt
{
  "name": "state_panel",
  "version": "lifecycle-test",
  "sha256": "a578954e62…",
  "byte_size": 2109440,
  "total_cost_usd": 0.0,
  "vintages": {
    "state_census_acs1": {
      "connector": "census",
      "synced_at": "2026-04-25T21:46:39",
      "params": {
        "geography": "state",
        "year": 2023,
        "variables": ["B01003_001E", "B19013_001E"]
      }
    },
    "state_bls_qcew":    { /* … */ },
    "state_fhfa_hpi":    { /* … */ },
    "state_bea_income":  { /* … */ }
  },
  "asserts": [
    { "name": "state_panel_row_count",    "passed": true },
    { "name": "state_fips_well_formed",   "passed": true },
    { "name": "population_present",       "passed": true },
    { "name": "median_income_plausible",  "passed": false,
      "severity": "warn", "failing_count": 1 }
  ]
}
Get started

Connect Claude Code in 60 seconds.

Three commands, one config block. No account required for the open kernel.

paste this in your terminal
$ curl -s https://govdata.ai/install | bash

One command. Installs uv + substrate, drops the state_panel bundle into ~/.govdata/, and registers an MCP server with Claude Code (or Claude Desktop).
Restart your client. Done.

Self-host (advanced) — run substrate yourself
1 — install kernel
$ pip install substrate
2 — pull a dataset bundle
$ substrate pull state_panel
  → ./state_panel.duckdb
  5 tables · 419 rows · sha256 ✓
3 — register with Claude Code
$ claude mcp add govdata --scope user \
    substrate -- serve ./state_panel.duckdb
Want hosted refresh, scheduled rebuilds, or premium datasets? Join the waitlist →