Metadata-Version: 2.3
Name: consist
Version: 0.1.4
Summary: Provenance tracking, intelligent caching, and data virtualization for scientific simulation workflows.
Keywords: provenance,caching,lineage,reproducibility,duckdb,scientific-computing
Author: Zach Needell
Author-email: Zach Needell <zaneedell@lbl.gov>
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
Requires-Dist: duckdb>=1.4.2
Requires-Dist: duckdb-engine>=0.17.0
Requires-Dist: mike>=2.2.0
Requires-Dist: pandas>=2.1.0
Requires-Dist: pyarrow>=22.0.0
Requires-Dist: pydantic>=2.12.4
Requires-Dist: rich>=13.7.1
Requires-Dist: ruff>=0.15.7
Requires-Dist: sqlalchemy>=2.0.44,<2.0.45
Requires-Dist: sqlmodel>=0.0.31
Requires-Dist: tqdm>=4.67.1
Requires-Dist: ty>=0.0.15
Requires-Dist: typer>=0.12.3
Requires-Dist: zensical>=0.0.39
Requires-Dist: pyyaml>=6.0 ; extra == 'activitysim'
Requires-Dist: cftime>=1.6.0 ; extra == 'all'
Requires-Dist: docker>=7.1.0 ; extra == 'all'
Requires-Dist: dlt>=1.21.0 ; extra == 'all'
Requires-Dist: geopandas>=0.14.0 ; extra == 'all'
Requires-Dist: h5netcdf>=1.3.0 ; extra == 'all'
Requires-Dist: h5py>=3.15.1 ; extra == 'all'
Requires-Dist: netcdf4>=1.6.0 ; extra == 'all'
Requires-Dist: openmatrix>=0.3.5.0 ; extra == 'all'
Requires-Dist: pyhocon>=0.3.60 ; extra == 'all'
Requires-Dist: pyyaml>=6.0 ; extra == 'all'
Requires-Dist: tables>=3.10.2 ; extra == 'all'
Requires-Dist: xarray>=2024.9.0 ; extra == 'all'
Requires-Dist: zarr>=2.18,<3 ; extra == 'all'
Requires-Dist: pyhocon>=0.3.60 ; extra == 'beam'
Requires-Dist: polars>=1.35.2 ; extra == 'bench'
Requires-Dist: psutil>6.0.0 ; extra == 'bench'
Requires-Dist: dlt>=1.21.0 ; extra == 'dev'
Requires-Dist: h5netcdf>=1.3.0 ; extra == 'dev'
Requires-Dist: h5py>=3.15.1 ; extra == 'dev'
Requires-Dist: openmatrix>=0.3.5.0 ; extra == 'dev'
Requires-Dist: pyhocon>=0.3.60 ; extra == 'dev'
Requires-Dist: pytest>=9.0.1 ; extra == 'dev'
Requires-Dist: pytest-cov>=7.0.0 ; extra == 'dev'
Requires-Dist: pytest-mock>=3.15.1 ; extra == 'dev'
Requires-Dist: ruff>=0.14.6 ; extra == 'dev'
Requires-Dist: tables>=3.10.2 ; extra == 'dev'
Requires-Dist: xarray>=2024.9.0 ; extra == 'dev'
Requires-Dist: zarr>=2.18,<3 ; extra == 'dev'
Requires-Dist: docker>=7.1.0 ; extra == 'docker'
Requires-Dist: mkdocs-material>=9.7.0 ; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=1.0.0 ; extra == 'docs'
Requires-Dist: ipykernel>=6.29.0 ; extra == 'examples'
Requires-Dist: jupyterlab>=4.0.0 ; extra == 'examples'
Requires-Dist: matplotlib>=3.8.0 ; extra == 'examples'
Requires-Dist: seaborn>=0.13.0 ; extra == 'examples'
Requires-Dist: ipywidgets>=8.1.8 ; extra == 'examples'
Requires-Dist: h5py>=3.15.1 ; extra == 'hdf5'
Requires-Dist: tables>=3.10.2 ; extra == 'hdf5'
Requires-Dist: dlt>=1.21.0 ; extra == 'ingest'
Requires-Dist: cftime>=1.6.0 ; extra == 'netcdf'
Requires-Dist: h5netcdf>=1.3.0 ; extra == 'netcdf'
Requires-Dist: xarray>=2024.9.0 ; extra == 'netcdf'
Requires-Dist: netcdf4>=1.6.0 ; extra == 'netcdf4'
Requires-Dist: xarray>=2024.9.0 ; extra == 'netcdf4'
Requires-Dist: openmatrix>=0.3.5.0 ; extra == 'omx'
Requires-Dist: geopandas>=0.14.0 ; extra == 'spatial'
Requires-Dist: h5netcdf>=1.3.0 ; extra == 'test'
Requires-Dist: openmatrix>=0.3.5.0 ; extra == 'test'
Requires-Dist: pyhocon>=0.3.60 ; extra == 'test'
Requires-Dist: pytest>=9.0.1 ; extra == 'test'
Requires-Dist: pytest-cov>=7.0.0 ; extra == 'test'
Requires-Dist: pytest-mock>=3.15.1 ; extra == 'test'
Requires-Dist: ruff>=0.14.6 ; extra == 'test'
Requires-Dist: xarray>=2024.9.0 ; extra == 'test'
Requires-Dist: zarr>=2.18,<3 ; extra == 'test'
Requires-Dist: pandas-stubs~=2.3.3 ; extra == 'typecheck'
Requires-Dist: ty>=0.0.7 ; extra == 'typecheck'
Requires-Dist: xarray>=2024.9.0 ; extra == 'zarr'
Requires-Dist: zarr>=2.18,<3 ; extra == 'zarr'
Requires-Python: >=3.11
Provides-Extra: activitysim
Provides-Extra: all
Provides-Extra: beam
Provides-Extra: bench
Provides-Extra: dev
Provides-Extra: docker
Provides-Extra: docs
Provides-Extra: examples
Provides-Extra: hdf5
Provides-Extra: ingest
Provides-Extra: netcdf
Provides-Extra: netcdf4
Provides-Extra: omx
Provides-Extra: spatial
Provides-Extra: test
Provides-Extra: typecheck
Provides-Extra: zarr
Description-Content-Type: text/markdown

<p align="center">
  <img src="docs/assets/logo.png" alt="Consist" width="320">
</p>

<p align="center">
  <a href="https://github.com/LBNL-UCB-STI/consist/actions/workflows/ci.yml">
    <img src="https://github.com/LBNL-UCB-STI/consist/actions/workflows/ci.yml/badge.svg" alt="CI">
  </a>
  <img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+">
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-BSD--3--Clause-blue.svg" alt="License BSD 3-Clause"></a>
</p>

**Consist** is a caching and provenance layer for scientific simulation
workflows. It records the code, configuration, input data, and output artifacts
behind each run so expensive steps can be skipped safely and results remain
queryable after the fact.

Consist is useful when a workflow has:

- long-running model steps that should cache-hit when inputs are unchanged;
- scenario variants that need explicit lineage and comparison;
- file-based tools that need stable local paths but still need canonical
  provenance;
- post-run questions like "which config produced this output?"

## Installation

```bash
pip install consist
```

Optional integrations are installed as extras:

```bash
pip install "consist[ingest]"
pip install "consist[docker]"
```

> [!NOTE]
> Consist is pre-`1.0`. It is ready for real workflows, but minor releases may
> still include breaking changes while the API settles.

## Quick Example

```python
from pathlib import Path

import pandas as pd

import consist
from consist import ExecutionOptions, Tracker

tracker = Tracker(run_dir="./runs", db_path="./provenance.duckdb")


def clean_data(raw: Path, threshold: float = 0.5) -> dict[str, Path]:
    df = pd.read_parquet(raw)
    out = Path("./cleaned.parquet")
    df[df["value"] > threshold].to_parquet(out)
    return {"cleaned": out}


first = tracker.run(
    fn=clean_data,
    inputs={"raw": Path("raw.parquet")},
    config={"threshold": 0.5},
    outputs=["cleaned"],
    execution_options=ExecutionOptions(input_binding="paths"),
)

second = tracker.run(
    fn=clean_data,
    inputs={"raw": Path("raw.parquet")},
    config={"threshold": 0.5},
    outputs=["cleaned"],
    execution_options=ExecutionOptions(input_binding="paths"),
)

print(first.cache_hit, second.cache_hit)  # False, True
cleaned = consist.load_df(second.outputs["cleaned"])
```

In this example, `input_binding="paths"` tells Consist to pass local `Path` objects 
into the callable instead of loading input files. Those same paths are still hashed 
and recorded for cache identity and lineage. For tools that need inputs copied to
specific local filenames, see [Usage Guide](docs/usage-guide.md#path-staging-example).

## Documentation

| Start here | Use it for |
|:--|:--|
| [Quickstart](docs/getting-started/quickstart.md) | First tracked run and cache hit |
| [First Workflow](docs/getting-started/first-workflow.md) | Two-step pipeline with explicit artifact links |
| [Usage Guide](docs/usage-guide.md) | Choosing between `run`, `trace`, and `scenario` |
| [Caching & Hydration](docs/concepts/caching-and-hydration.md) | Cache identity, hit behavior, and output recovery concepts |
| [Historical Recovery](docs/guides/historical-recovery.md) | Restoring archived outputs and staging inputs |
| [CLI Reference](docs/cli-reference.md) | Inspecting runs, artifacts, lineage, and schemas |
| [API Reference](docs/api/index.md) | Public Python API and generated signatures |

## Etymology

In railroad terminology, a **consist** is the lineup of locomotives and cars
that make up a train. In this library, a consist is the immutable record of the
code, config, inputs, and outputs coupled together to produce a result.
