Metadata-Version: 2.4
Name: LSMS_Library
Version: 0.7.2
Summary: Abstraction layer for Living Standards Measurement Survey data
License: CC-BY-4.0
License-File: LICENSE.txt
Author: Ethan Ligon
Author-email: ligon@berkeley.edu
Requires-Python: >=3.11, <4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: CFEDemands (>=0.8.2)
Requires-Dist: PyYAML (>=6.0,<7.0)
Requires-Dist: dvc[s3] (>=3.67.0,<4.0.0)
Requires-Dist: fooddatacentral (>=1.0.10,<2.0.0)
Requires-Dist: ligonlibrary (>=0.2.0)
Requires-Dist: openai (>=1.99.5,<2.0.0)
Requires-Dist: pandas (>=2.2)
Requires-Dist: pint (>=0.25.2,<0.26.0)
Requires-Dist: platformdirs (>=4.0.0,<5.0.0)
Requires-Dist: pyarrow (>=21.0.0)
Requires-Dist: pyreadstat (>=1.3.0,<2.0.0)
Requires-Dist: python-Levenshtein (>=0.27.3,<0.28.0)
Requires-Dist: python-gnupg (>=0.5,<1.0)
Requires-Dist: scikit-learn (>=1.7.0,<2.0.0)
Requires-Dist: typer (>=0.12.0,<1.0.0)
Project-URL: Repository, https://github.com/ligon/LSMS_Library
Description-Content-Type: text/plain

#+TITLE: LSMS_Library
#+AUTHOR: Ethan Ligon
#+OPTIONS: toc:nil

[[https://doi.org/10.5281/zenodo.17258079][https://zenodo.org/badge/796958546.svg]]

A Python library providing a uniform interface to Living Standards Measurement Study (LSMS) household surveys from multiple countries and years, without the data loss typical of traditional harmonization approaches.

* The Problem

LSMS datasets are invaluable for studying poverty, consumption, and household welfare across developing countries. However, each country's survey uses different:
- Variable names and encodings
- Food classification systems
- Questionnaire structures
- File formats and organization

Researchers typically spend weeks learning each new dataset's idiosyncrasies or use pre-harmonized datasets that sacrifice detail and comparability. Cross-country or longitudinal analyses become prohibitively time-consuming.

* The Solution

LSMS_Library provides an *abstraction layer* that gives you a consistent interface to work with any supported LSMS dataset. Instead of harmonizing the data itself (which loses information), we harmonize the /way you access/ the data.

* Installation

From PyPI (when available):

#+begin_src bash
pip install LSMS_Library
#+end_src

From github (current release):

#+begin_src bash
pip install git+https://github.com/ligon/LSMS_Library.git@v0.7.0
#+end_src

From a source checkout (for contributors):

#+begin_src bash
git clone https://github.com/ligon/LSMS_Library.git
cd LSMS_Library
poetry install
#+end_src

* Quick Start

#+begin_src python
import lsms_library as ll

# Single-country access
uga = ll.Country('Uganda')
uga.waves          # ['2005-06', '2009-10', ..., '2019-20']
uga.data_scheme    # ['food_acquired', 'household_roster', ...]
food = uga.food_expenditures()   # Standardized DataFrame, all waves

# Cross-country analysis
roster = ll.Feature('household_roster')
roster.countries   # ['Burkina_Faso', 'Ethiopia', 'Mali', 'Uganda', ...]
df = roster()      # Harmonized DataFrame across all countries
#+end_src

* Data Access

This library abstracts over Living Standards Measurement Study (LSMS) survey data.  The underlying microdata belongs to the respective national statistics offices and the World Bank; users must accept the [[https://microdata.worldbank.org/][World Bank Microdata Library]]'s terms of use before accessing it.

** Authentication: the WB Microdata API key

1. Register at https://microdata.worldbank.org/ (free).
2. Accept the terms of use for the LSMS collections you want to access.
3. Get your API key from your account dashboard.
4. Create =~/.config/lsms_library/config.yml= with:

   #+begin_src yaml
   microdata_api_key: your_key_here
   # data_dir: /path/to/override   # same as LSMS_DATA_DIR env var
   #+end_src

   or set =MICRODATA_API_KEY= as an environment variable.
5. On =import lsms_library=, the library validates your key against the WB catalog and automatically unlocks access to the S3 read cache.  No further setup is required.

** The S3 cache

Once your WB API key is validated, the library unlocks a read-only S3 cache that mirrors the WB Microdata downloads.  This is a convenience -- the S3 cache is faster than the WB NADA API and reduces load on the WB service -- but it is not a separate access layer.  The WB terms of use are the authoritative gate; the S3 cache just provides the same data faster.

Decrypted plaintext credentials are written to =~/.config/lsms_library/s3_creds= (or the path in the =LSMS_S3_CREDS= environment variable), not into the package tree -- so the library is safe to install from a wheel into a read-only site-packages directory.

** Non-interactive environments

In CI, Docker builds, or other non-interactive contexts, set =LSMS_SKIP_AUTH=1= to suppress the import-time authentication flow.  In that mode you are responsible for ensuring =~/.config/lsms_library/s3_creds= exists (e.g. via a CI secret mount) before the first data access.

** Data cache location

Parquet caches materialize under the platform-appropriate user data directory:

- Linux: =~/.local/share/lsms_library/= by default
- Override with =LSMS_DATA_DIR= env var or =data_dir= in =config.yml=

See the [[https://ligon.github.io/LSMS_Library/guide/caching/][caching guide]] for details on =assume_cache_fresh=, the =Country= and =Feature= classes, and per-country build methods.

* Documentation

Full documentation is available at [[https://ligon.github.io/LSMS_Library][ligon.github.io/LSMS_Library]], including:

- **[[https://ligon.github.io/LSMS_Library/getting-started/][Getting Started]]** -- installation and first steps
- **[[https://ligon.github.io/LSMS_Library/guide/country/][Country Guide]]** -- single-country workflows, harmonization pipeline, derived tables
- **[[https://ligon.github.io/LSMS_Library/guide/feature/][Feature Guide]]** -- cross-country analysis with =ll.Feature=
- **[[https://ligon.github.io/LSMS_Library/guide/caching/][Caching]]** -- performance tuning, build backends, cache management
- **[[https://ligon.github.io/LSMS_Library/guide/panel-data/][Panel Data]]** -- longitudinal analysis and ID harmonization
- **[[https://ligon.github.io/LSMS_Library/api/country/][API Reference]]** -- complete class documentation (auto-generated from source)

* Contributing

See [[file:CONTRIBUTING.org][CONTRIBUTING.org]] for detailed guidelines on adding new datasets using DVC.

* Citation

If you use LSMS_Library in your research, please cite:

#+begin_src bibtex
@software{ligon25:lsms_library,
  author =    {Ethan Ligon},
  title =     {{\tt LSMS_Library}: Abstraction layer for working with Living Standards Measurement Surveys},
  year =      2025,
  doi = {10.5281/zenodo.17258079},
  url = {https://pypi.org/project/lsms_library/}
}
#+end_src

* License

See the [[file:LICENSE][LICENSE]] file in the repository for details.

* Acknowledgments

This project builds on data collection efforts by:
- The World Bank's Living Standards Measurement Study (LSMS) team
- National statistical offices in participating countries
- The LSMS-ISA initiative

