Metadata-Version: 2.4
Name: datacube-benchmark
Version: 0.1.0
Summary: Utilities to benchmark datacubes with various formats, compressions, and chunking schemes.
Project-URL: Homepage, https://developmentseed.org/datacube-benchmark/
Project-URL: Documentation, https://developmentseed.org/datacube-benchmark/
Project-URL: Repository, https://github.com/developmentseed/datacube-benchmark
Project-URL: Issues, https://github.com/developmentseed/datacube-benchmark/issues
Author-email: Max Jones <14077947+maxrjones@users.noreply.github.com>
License-Expression: MIT
License-File: LICENSE.txt
Keywords: benchmark,datacube,geospatial,obstore,xarray,zarr
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: GIS
Classifier: Topic :: System :: Benchmark
Requires-Python: >=3.12
Requires-Dist: arro3-core>=0.5.1
Requires-Dist: dask>=2025.5.1
Requires-Dist: numcodecs>=0.16.1
Requires-Dist: numpy>=2.0
Requires-Dist: obstore>=0.6.0
Requires-Dist: pandas>=2.0
Requires-Dist: pint>=0.24.4
Requires-Dist: pyarrow>=20.0.0
Requires-Dist: xarray>=2025.6.1
Requires-Dist: zarr>=3.0.8
Description-Content-Type: text/markdown

# datacube-benchmark

[![Docs](https://img.shields.io/badge/docs-developmentseed.org%2Fdatacube--benchmark-blue)](https://developmentseed.org/datacube-benchmark/)
[![PyPI](https://img.shields.io/pypi/v/datacube-benchmark.svg)](https://pypi.org/project/datacube-benchmark/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Utilities for benchmarking [Zarr](https://zarr.dev/) datacubes — generate
synthetic stores with different chunking schemes, compressors, and
dtypes, then measure read performance under realistic access patterns.

Companion package to the [Datacube Guide](https://developmentseed.org/datacube-guide/),
which documents common pitfalls when producing and consuming
multi-dimensional data products.

## Installation

```bash
pip install datacube-benchmark
```

Python 3.12+ is required.

## Quickstart

Create a synthetic Zarr store on local disk and time a few random-access
patterns against it:

```python
from pathlib import Path

import obstore as obs
import zarr

import datacube_benchmark

path = Path.cwd() / "data" / "test.zarr"
path.mkdir(parents=True, exist_ok=True)
store = obs.store.LocalStore(str(path))
zarr_store = datacube_benchmark.create_zarr_store(store)

arr = zarr.open_array(zarr_store, zarr_version=3, path="data")
results = datacube_benchmark.benchmark_access_patterns(arr, num_samples=10)
print(results)
```

`create_zarr_store` takes target sizes and chunk shapes as strings or
[`pint`](https://pint.readthedocs.io/) quantities (e.g. `"1 GB"`,
`"10 MB"`), and writes through an [`obstore`](https://developmentseed.org/obstore/)
store — so the same call works against a local directory, S3, GCS, or
Azure by swapping the store.

## What's in the box

- **`create_zarr_store`**, **`create_or_open_zarr_store`**,
  **`create_or_open_zarr_array`**, **`create_empty_dataarray`** — build
  synthetic Zarr datacubes at a target size, resolution, and chunk
  shape.
- **`benchmark_zarr_array`** — time random reads against one access
  pattern (`"point"`, `"time_series"`, `"spatial_slice"`, `"full"`) and
  return summary statistics with units attached.
- **`benchmark_access_patterns`** — run all four access patterns and
  return the combined results as a `pandas.DataFrame`.
- **`benchmark_dataset_open`** — time `xarray.open_dataset` on a Zarr
  store.
- **`Config`** — a dataclass collecting the common knobs (compressor,
  target array size, sample counts, concurrency).

See the [API reference](https://developmentseed.org/datacube-benchmark/api.html)
for the full signatures and parameter docs.

## License

[MIT](https://github.com/developmentseed/datacube-benchmark/blob/main/LICENSE.txt)
