Metadata-Version: 2.4
Name: dagsampler
Version: 0.1.0
Summary: Configurable causal DAG simulator for synthetic mixed-type data and CI test benchmarks
Author: Pavel Averin
License-Expression: MIT
Project-URL: Homepage, https://github.com/averinpa/dagsampler
Project-URL: Documentation, https://averinpa.github.io/dagsampler/
Project-URL: Issues, https://github.com/averinpa/dagsampler/issues
Project-URL: Source, https://github.com/averinpa/dagsampler
Project-URL: Changelog, https://github.com/averinpa/dagsampler/blob/main/CHANGELOG.md
Keywords: causal-inference,causal-discovery,conditional-independence,dag,simulation,synthetic-data,benchmarks
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.22
Requires-Dist: pandas>=1.3
Requires-Dist: networkx>=2.6
Requires-Dist: scipy>=1.7
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ipykernel>=6.29.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx<8,>=7.2; extra == "docs"
Requires-Dist: furo>=2024.8.6; extra == "docs"
Provides-Extra: examples
Requires-Dist: jupyter>=1.0; extra == "examples"
Requires-Dist: matplotlib>=3.7; extra == "examples"
Dynamic: license-file

# dagsampler

[![PyPI version](https://img.shields.io/pypi/v/dagsampler.svg)](https://pypi.org/project/dagsampler/)
[![Python versions](https://img.shields.io/pypi/pyversions/dagsampler.svg)](https://pypi.org/project/dagsampler/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Documentation](https://img.shields.io/badge/docs-averinpa.github.io-blue.svg)](https://averinpa.github.io/dagsampler/)

Configurable causal DAG simulator for synthetic mixed-type data and CI test benchmarks.

[Documentation](https://averinpa.github.io/dagsampler/) · [Changelog](CHANGELOG.md)

## What it provides

- `CausalDataGenerator` class for configurable simulation
- Support for `custom` and `random` DAGs
- Mixed continuous/binary/categorical nodes (configurable categorical cardinality)
- Structural forms: `linear`, `polynomial`, `interaction`, `sigmoid`, `cos`, `sin`, `stratum_means`
- Optional element-wise `post_transform` (`tanh`, `sin`, `cos`, `exp_neg_abs`, `sqrt_abs`, `relu`, `sign`)
- Cross-type mechanisms:
  - continuous -> categorical (`categorical_model.name = "threshold"`)
  - categorical -> continuous (`functional_form.name = "stratum_means"`, including mixed-parent cases with `metric_weights`)
- Noise models:
  - additive (`gaussian`, `student_t`, `gamma`, `exponential`, `laplace`, `cauchy`, `uniform`)
  - multiplicative (`gaussian`, `student_t`, `gamma`, `exponential`)
  - heteroskedastic (`abs_first_parent`, `abs_parent_plus_const`, `mean_abs_plus_const`)
- Random weight sampling controls (including exclusion band around zero)
- `force_uniform_marginals` for balanced exogenous binary / categorical draws
- Template helpers (`chain_config`, `fork_config`, `collider_config`, `independence_config`)
- Reproducibility via `seed_structure` and `seed_data` (or single `seed`)
- Optional d-separation CI oracle output (`store_ci_oracle=true`)

## Installation

From PyPI:

```bash
pip install dagsampler
```

Or with [uv](https://docs.astral.sh/uv/):

```bash
uv venv
source .venv/bin/activate
uv pip install dagsampler
```

From GitHub (latest `main`):

```bash
uv pip install "dagsampler @ git+https://github.com/averinpa/dagsampler.git"
```

## Random weights away from zero

To guarantee a minimum signal strength on every edge — so randomly sampled
weights don't end up effectively muting a parent — configure:

```json
{
  "simulation_params": {
    "random_weight_low": -1.5,
    "random_weight_high": 1.5,
    "random_weight_min_abs": 0.1
  }
}
```

This samples random structural weights from:
- `[-1.5, -0.1] U [0.1, 1.5]`

By default, categorical parents are not allowed with metric functional forms
(`linear`, `polynomial`, `interaction`). Set:
- `"categorical_parent_metric_form_policy": "stratum_means"`
to auto-redirect those cases to `stratum_means`.

## Quick start (Python API)

```python
from dagsampler import CausalDataGenerator

config = {
    "simulation_params": {"n_samples": 200, "seed": 42},
    "graph_params": {
        "type": "custom",
        "nodes": ["X", "Y", "Z1"],
        "edges": [["X", "Z1"], ["Y", "Z1"]],
    },
}

result = CausalDataGenerator(config).simulate()
data = result["data"]
dag = result["dag"]
params = result["parametrization"]
```

## CLI

The package exposes `dagsampler-generate`.

```bash
dagsampler-generate \
  --config config.json \
  --output dataset.csv \
  --params-out params.json \
  --edges-out edges.json
```

`config.json` must contain the same structure used by `CausalDataGenerator`.

For heteroskedastic noise, use `noise_model.func` from:
- `abs_first_parent`
- `abs_parent_plus_const`
- `mean_abs_plus_const`

## Development

```bash
uv pip install -e ".[dev]"
pytest -q
```
