Metadata-Version: 2.4
Name: cisl
Version: 0.2.3
Summary: Ensemble of tools for CISL data
Author-email: Yen-Ting Liu <ytliu2@illinois.edu>
License-Expression: Apache-2.0
Project-URL: Homepage, https://chemimage.illinois.edu/
Project-URL: Repository, https://github.com/chemimage/cisl-py
Project-URL: Issues, https://github.com/chemimage/cisl-py/issues
Keywords: spectroscopy,microscope,microscopy,chemical imaging
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click
Requires-Dist: coloredlogs
Requires-Dist: dask
Requires-Dist: h5py>=3.16.0
Requires-Dist: imageio
Requires-Dist: matplotlib
Requires-Dist: natsort
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: parse
Requires-Dist: pint
Requires-Dist: psutil
Requires-Dist: pybaselines
Requires-Dist: pyfftw>=0.15.1
Requires-Dist: pympler>=1.1
Requires-Dist: SciencePlots
Requires-Dist: scikit-image
Requires-Dist: scipy
Requires-Dist: spectral
Requires-Dist: tifffile
Requires-Dist: tqdm
Requires-Dist: xarray
Dynamic: license-file

# cisl

This project is managed with `uv`.

## Setup

Install the project and development dependencies:

```powershell
uv sync --dev
```

`uv` will create a local virtual environment in `.venv` and use the Python version
declared in `.python-version` (`3.12`).

## Test Data

Some tests depend on sample datasets under `tests/samples/`, and those files are
tracked with Git LFS rather than regular git blobs.

After cloning the repository, fetch them with:

```powershell
git lfs install
git lfs pull
```

If files under `tests/samples/` look like small pointer text files instead of
real binary data, Git LFS content has not been downloaded yet.

## Common commands

Run the CLI:

```powershell
uv run repack <path-to-dataset>
```

Run a module or script with the project environment:

```powershell
uv run python -m cisl.scripts.repack --help
```

Install a Jupyter kernel for the synced environment:

```powershell
uv run python -m ipykernel install --user --name cisl-py
```

## ENVI I/O

The ENVI reader supports both eager and lazy access.

```python
from cisl.io import envi

eager = envi.open("dataset.hdr")
lazy = envi.open("dataset.hdr", lazy=True)
chunked = envi.open(
    "dataset.hdr",
    chunks={"wavenumber": 16, "y": 512, "x": 512},
)
```

Implementation detail:

- `spectral` is still used for ENVI header parsing and file creation.
- The data path uses a direct `numpy.memmap` backend.
- Lazy reads wrap that memmap with `dask.array`, so slices and reductions can stay out-of-core until you call `.compute()`.

Writes are also chunk-friendly:

```python
envi.save("out.hdr", chunked)
```

If the input is dask-backed, `envi.save` writes one chunk at a time into an ENVI memmap instead of materializing the full cube first.

## Dask Memory Notes

Dask delays work, but it does not make large arrays free. The common ways to still run out of memory are:

- calling `.values`, `.to_numpy()`, or `.compute()` on the full cube
- using chunks that are too large for the operation
- triggering implicit rechunking or other operations that need big temporary arrays

Practical guidance:

- choose chunks when opening the file
- slice or reduce before computing
- avoid converting the whole array to NumPy unless you really need it
- keep chunk sizes moderate and aligned with how you plan to process the data

## Notebook Demo

See [notebooks/demos/Read core_2x2 ENVI.ipynb](notebooks/demos/Read%20core_2x2%20ENVI.ipynb) for a small example that opens the sample `core_2x2.hdr` dataset lazily and reads a spectrum plus a single-band image.
