Metadata-Version: 2.4
Name: esrf-data-compressor
Version: 0.2.0
Summary: A library to compress ESRF data and reduce their footprint
Author-email: ESRF <dau-pydev@esrf.fr>
License: MIT License
        
        **Copyright (c) 2025 European Synchrotron Radiation Facility**
        
        Permission is hereby granted, free of charge, to any person obtaining a copy of
        this software and associated documentation files (the "Software"), to deal in
        the Software without restriction, including without limitation the rights to
        use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
        the Software, and to permit persons to whom the Software is furnished to do so,
        subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
        FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
        COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
        IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
        CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
        
Project-URL: Homepage, https://gitlab.esrf.fr/dau/esrf-data-compressor
Project-URL: Documentation, https://esrf-data-compressor.readthedocs.io/
Project-URL: Repository, https://gitlab.esrf.fr/dau/esrf-data-compressor
Project-URL: Issues, https://gitlab.esrf.fr/dau/esrf-data-compressor/issues
Project-URL: Changelog, https://gitlab.esrf.fr/dau/esrf-data-compressor/-/blob/main/CHANGELOG.md
Keywords: ESRF,pathlib
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: h5py
Requires-Dist: hdf5plugin
Requires-Dist: blosc2-grok
Requires-Dist: scikit-image
Requires-Dist: tqdm
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Provides-Extra: dev
Requires-Dist: esrf-data-compressor[test]; extra == "dev"
Requires-Dist: black>=22; extra == "dev"
Requires-Dist: flake8>=4.0; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Provides-Extra: doc
Requires-Dist: sphinx>=6.0; extra == "doc"
Requires-Dist: sphinxcontrib-mermaid>=0.7; extra == "doc"
Requires-Dist: sphinx-autodoc-typehints>=1.16; extra == "doc"
Requires-Dist: pydata-sphinx-theme; extra == "doc"
Dynamic: license-file

# ESRF Data Compressor

**ESRF Data Compressor** is a command-line tool and Python library designed to compress large ESRF HDF5 datasets (3D volumes) and verify data consistency via SSIM. The default compression backend uses Blosc2 + Grok (JPEG2000).

---

## Features

* **Discover raw HDF5 dataset files** under an experiment’s `RAW_DATA`

  * Goes through the HDF5 Virtual Datasets to find the data to compress
  * Allows to filter down scan by scan based on the value of a key

* **Slice-by-slice compression**

  * Uses Blosc2 + Grok (JPEG2000) on every slice of each 3D dataset (axis 0)
  * User-configurable compression ratio (e.g. `--cratio 10`)

* **Parallel execution**

  * Automatically factors CPU cores into worker processes × per-process threads
  * By default, each worker runs up to 4 Blosc2 threads (or falls back to 1 thread if < 4 cores)

* **Non-destructive workflow**

  1. `compress` writes compressed files either:
     - next to each source as `<basename>_<compression_method>.h5` (`--layout sibling`), or
     - under a mirrored `RAW_DATA_COMPRESSED` tree using the same source file names, while copying non-compressed folders/files (`--layout mirror`, default)
  2. `check` computes SSIM (first and last frames) and writes a report
  3. `overwrite` (optional) swaps out the raw frame file (irreversible)

* **Four simple CLI subcommands**

  * `compress-hdf5 list`  Show all raw HDF5 files to be processed
  * `compress-hdf5 compress` Generate compressed siblings
  * `compress-hdf5 check`  Produce a per-dataset SSIM report between raw & compressed
  * `compress-hdf5 overwrite` Atomically replace each raw frame file (irreversible)

---

## Installation

### From PyPI

```bash
pip install esrf-data-compressor
```

Once installed, the `compress-hdf5` command will be available in your `PATH`.

### From Source (for development)

```bash
git clone https://gitlab.esrf.fr/dau/esrf-data-compressor.git
cd esrf-data-compressor

# (Optional) Create & activate a virtual environment
python -m venv venv
source venv/bin/activate

# Install build dependencies & the package itself
pip install .
```

---

## Documentation

Full documentation is available online:
[ESRF Data Compressor Docs](https://esrf-data-compressor.readthedocs.io/en/latest/index.html)

## Contributing & Development

* **Clone** the repository:

  ```bash
  git clone https://gitlab.esrf.fr/dau/esrf-data-compressor.git
  cd esrf-data-compressor
  ```

* **Install** dependencies (in a virtual environment):

  ```bash
  python -m venv venv
  source venv/bin/activate
  pip install -e "[dev]"
  ```

* **Run tests** with coverage:

  ```bash
  pytest -v --cov=esrf_data_compressor --cov-report=term-missing
  ```

* **Style:**

  * `black .`
  * `flake8 .`
  * `ruff .`

* **Build docs** (Sphinx + pydata theme):

  ```bash
  sphinx-build doc build/html
  ```

---

## License

This project is licensed under the [MIT License](LICENSE). See `LICENSE` for full text.

---

## Changelog

All noteworthy changes are recorded in [CHANGELOG.md](CHANGELOG.md). Version 0.1.0 marks the first public release with:

* Initial implementation of Blosc2 + Grok (JPEG2000) compression for 3D HDF5 datasets.
* SSIM-based integrity check (first & last slice).
* Four-command CLI (`compress-hdf5 list`, `compress-hdf5 compress`, `compress-hdf5 check`, `compress-hdf5 overwrite`).
* Parallelism with worker×thread auto-factoring.

For more details, see the full history in [CHANGELOG.md](CHANGELOG.md).
