Metadata-Version: 2.4
Name: DGP_Protocol
Version: 0.1.0a0
Summary: A Protocol for data-generating processes; minimal interface for analog-estimation toolkits.
License-Expression: BSD-3-Clause
License-File: LICENSE
Author: Ethan Ligon
Author-email: ligon@berkeley.edu
Requires-Python: >=3.11,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Dist: cloudpickle (>=3.0)
Requires-Dist: numpy (>=1.26)
Project-URL: Homepage, https://github.com/ligon/DGP_Protocol
Project-URL: Issues, https://github.com/ligon/DGP_Protocol/issues
Project-URL: Repository, https://github.com/ligon/DGP_Protocol
Description-Content-Type: text/markdown

A minimal Python Protocol for data-generating processes (DGPs).

# What this is

A **Protocol** (`DataGeneratingProcess`) with two members – `data` (a
frozen property returning the observed realization) and
`draw(size=..., *, rng=...)` (a method returning a fresh realization) –
plus a small set of composition primitives (`TwoStageDGP`, `with_data`)
and thin convenience wrappers (`EmpiricalDGP`, `ParametricDGP`) for
working with DGPs as first-class objects.

The package is **not** a library of working DGPs. Concrete DGPs live in
**consumer** packages – e.g.  
[ManifoldGMM](https://github.com/ligon/ManifoldGMM) ships its own
moment-side DGPs. The role of `DGP_Protocol` is to define the contract
that lets such consumers interoperate.

# Conceptual lineage

The Protocol promotes the **stand-in distribution** from Manski's analog
estimation framework (Manski 1988, *Analog Estimation Methods in
Econometrics*) to a first-class Python object. In that framework, an
estimator is defined by a population functional plus a sample-based
stand-in for the population; `DataGeneratingProcess` is that stand-in.
Different stand-ins yield different analog estimators:

- The empirical distribution -\> nonparametric plug-in estimators.
- A parametric family fitted to the data -\> MLE-style estimators.
- A bootstrap distribution -\> bootstrap inference.
- A null-imposed restriction -\> constrained estimators.

# Installation

``` bash
pip install DGP_Protocol
```

The import path is PEP-8 lowercase:

``` python
from dgp_protocol import DataGeneratingProcess, EmpiricalDGP, TwoStageDGP
```

# Minimal example

``` python
import numpy as np
from dgp_protocol import EmpiricalDGP

data = np.random.default_rng(0).standard_normal(size=(100, 3))

# The DGP owns its own RNG.  Pass `seed` for reproducibility;
# `draw()` itself takes no `rng` argument.
dgp = EmpiricalDGP(observation=data, seed=1)
print(dgp.data.shape)                  # (100, 3) -- the frozen realization
print(dgp.draw().shape)                # (100, 3) -- a fresh bootstrap resample

# Rebind to a different realization while keeping the distributional
# structure.  The child gets an independent (spawned) Generator.
fresh = dgp.with_data(np.random.default_rng(2).standard_normal(size=(50, 3)))
print(fresh.data.shape)                # (50, 3)
```

For more substantial examples – parametric DGPs, two-stage composition
(hierarchical sampling), cluster-block bootstrap – see the test suite
under [tests/](tests/).

# Design

The design is intentionally minimal: `data` + `draw` are the only
required members. Composition primitives (`TwoStageDGP`, `with_data`)
take DGPs and return DGPs without expanding the Protocol.

The design note that motivated this package lives in the sibling
[ManifoldGMM](https://github.com/ligon/ManifoldGMM) repo at
`docs/design/dgp.org` – DGP<sub>Protocol</sub> was extracted from that
design conversation. See also [AGENTS.md](AGENTS.md) for the package's
scope discipline and the list of intentionally deferred features.

# How to cite

If you use DGP<sub>Protocol</sub> in academic work, please cite it. The
repository's `CITATION.cff` is recognised by GitHub and provides
one-click citation export in APA, BibTeX, and other formats from the
repo's main page.

A BibTeX entry suitable for paper drafts:

``` bibtex
@software{ligon_dgp_protocol_2026,
  author    = {Ligon, Ethan},
  title     = {DGP\_Protocol: A Protocol for data-generating processes},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://github.com/ligon/DGP_Protocol},
  version   = {0.1.0a0},
  license   = {BSD-3-Clause},
}
```

# License

BSD 3-Clause (`BSD-3-Clause`). See the `LICENSE` file at the root of
this repository. In short: permissive use including commercial,
modification, and redistribution; preserve the copyright notice and
license text in redistributions; no use of the author's name to endorse
derived products.

# Author

Ethan Ligon, UC Berkeley.

