Metadata-Version: 2.4
Name: cdfidata
Version: 0.1.1
Summary: ETL pipeline for US Treasury CDFI Fund public datasets — TLR, CLR, ILR, NMTC, and Awards data
License: MIT
Project-URL: Homepage, https://github.com/Jaypatel1511/cdfi-data
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.4.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: requests>=2.27.0

# cdfi-data 🏦

**ETL pipeline for US Treasury CDFI Fund public datasets.**

Download, clean, and analyze Transaction Level Report (TLR), Consumer Loan Report (CLR),
and Awards data from the US Department of Treasury's CDFI Fund — in one line of Python.

---

## Why cdfi-data?

The CDFI Fund releases massive public datasets covering millions of loans and investments
in low-income communities. But the raw files are messy, inconsistently formatted, and
require significant cleaning before analysis. cdfi-data standardizes the entire pipeline.

---

## Installation

    pip install cdfidata

---

## Quickstart

    from cdfidata import TLRLoader, CLRLoader, AwardsLoader

    # Load TLR transaction data (downloads & caches automatically)
    tlr = TLRLoader()
    df = tlr.load(year=2022)

    # Filter to Illinois
    il = tlr.filter_state("IL")

    # Filter by loan type and amount
    small_biz = tlr.filter_loan_type("Business")
    large = tlr.filter_amount(min_amount=500_000)

    # Summary stats
    tlr.summary()

    # Export
    tlr.to_csv("cdfi_transactions.csv")
    tlr.to_sqlite("cdfi.db", table="tlr")

---

## Sample Data (No Download Required)

    from cdfidata import TLRLoader, CLRLoader, AwardsLoader

    tlr = TLRLoader()
    df = tlr.load_sample(n=1000)

    clr = CLRLoader()
    df = clr.load_sample(n=1000)

    awards = AwardsLoader()
    df = awards.load_sample(n=500)

---

## Datasets Supported

- TLR (Transaction Level Report) — 1M+ individual CDFI loans, 61 variables
- CLR (Consumer Loan Report) — 3.2M consumer loans aggregated to census tract
- Awards Database — all CDFI Fund program awardees across all years

Coming soon: ILR (Institution Level Report), NMTC Allocatee data

---

## Data Source

All data comes directly from the US Department of Treasury CDFI Fund:
https://www.cdfifund.gov/research-data

Data is released annually under open government data principles.

---

## Running Tests

    PYTHONPATH=. pytest tests/ -v

30 tests across all modules.

---

## Who This Is For

- Impact investors analyzing CDFI loan portfolios
- Academic researchers studying community development finance
- Policy analysts evaluating CDFI Fund program outcomes
- CDFIs benchmarking their own performance against peers
- Anyone who needs clean, analysis-ready CDFI Fund data

---

## License

MIT 2026 Jaypatel1511
