Metadata-Version: 2.4
Name: raster2stac
Version: 2026.5.1
Summary: Create valid STAC Collections, Items and Assets given already existing raster datasets
Author-email: Michele Claus <michele.claus@eurac.edu>, Lorenzo Mercurio <lorenzo.mercurio@eurac.edu>, Rufai Omowunmi Balogun <rbalogun@eurac.edu>, Suriyah Dhinakaran <sdhinakaran@eurac.edu>
Project-URL: Homepage, https://gitlab.inf.unibz.it/earth_observation_public/raster-to-stac
Project-URL: Issues, https://gitlab.inf.unibz.it/earth_observation_public/raster-to-stac/-/issues
Keywords: STAC,Metadata,Cloud-Optimized GeoTIFFs,Kerchunk,NetCDF
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: xarray<2026.0.0,>=2024.10.0
Requires-Dist: rioxarray
Requires-Dist: pystac[validation]>=1.14
Requires-Dist: stac-validator
Requires-Dist: rio-stac
Requires-Dist: boto3
Requires-Dist: fsspec
Requires-Dist: s3fs
Requires-Dist: ujson
Requires-Dist: rasterio
Requires-Dist: dask
Requires-Dist: netcdf4==1.7.2
Requires-Dist: rio-cogeo
Requires-Dist: h5netcdf
Requires-Dist: h5py
Requires-Dist: pydantic
Requires-Dist: zarr<4,>=2.18.3
Requires-Dist: numcodecs<0.17.0,>=0.16.0
Dynamic: license-file

# Raster-to-STAC

This component allows the creation of STAC Collections with Items and Assets starting from different kinds of raster datasets. It also enables automatic upload of the resulting files to an Amazon S3 Bucket, making them publicly accessible worldwide. The goal is to make datasets easily accessible, interoperable, and shareable.

## Approaches

Depending on your requirements, different output formats are available:

### 1. Via COGs (Cloud Optimized GeoTIFFs)
This approach reads the input dataset and generates multiple Cloud Optimized GeoTIFFs on local disk. This provides high interoperability with third-party libraries for data reading and visualization.

### 2. Via ZARR
This approach converts data to ZARR format, which is optimized for cloud storage and efficient chunked access to large datasets. The second approach will write a STAC Collection with a single Zarr object.

### 3. Via netCDF
This approach keeps data in netCDF format while generating STAC metadata for discovery and access.

## Installation

### Prerequisites
Make sure to manually specify a [PEP 440](https://peps.python.org/pep-0440/)-compliant version in the `pyproject.toml` file. The recommended format uses Calendar Versioning (CalVer) like `2025.10.1`:

```bash
sed -i "s/SEMANTIC_VERSION/$VERSION/g" pyproject.toml
```

Then install the package:
```bash
pip install .
```

### Quick Installation
You can also install directly using pip:

```bash
pip install raster2stac
```

## Quickstart

This section provides a quick overview of how to use raster2stac to convert your raster data into STAC-compliant assets.

### Get Sample Data

You can download sample data to test the package:

```bash
wget https://github.com/euracresearch/raster2stac/raw/main/tests/data/S2_L2A_sample.nc
wget https://github.com/Open-EO/openeo-localprocessing-data/raw/main/sample_netcdf/S2_L2A_sample.nc
```

## Usage Examples

### Basic Usage

The main class `Raster2STAC` provides different methods to generate STAC items and collections for various data formats.

#### Generate COG STAC from netCDF file

Convert a netCDF file to Cloud Optimized GeoTIFFs (COGs) and generate STAC items:

```python
from raster2stac import Raster2STAC

rs2stac = Raster2STAC(
    data="S2_L2A_sample.nc",  # The netCDF which will be converted into COGs
    collection_id="SENTINEL2_L2A_SAMPLE",  # The Collection id we want to set
    collection_url="https://stac.eurac.edu/collections/",  # The URL where the collection will be exposed
    output_folder="SENTINEL2_L2A_SAMPLE_STAC_COG"
).generate_cog_stac()
```

#### Generate STAC from netCDF file (keep netCDF format)

Generate STAC items while keeping the data in netCDF format:

```python
from raster2stac import Raster2STAC

rs2stac = Raster2STAC(
    data="S2_L2A_sample.nc",  # The netCDF which will be converted into COGs
    collection_id="SENTINEL2_L2A_SAMPLE",  # The Collection id we want to set
    collection_url="https://stac.eurac.edu/collections/",  # The URL where the collection will be exposed
    output_folder="SENTINEL2_L2A_SAMPLE_STAC_NETCDF"
).generate_netcdf_stac()
```

You can then load the STAC item using pystac_client and odc.stac:

```python
import pystac_client
import json
import odc.stac

item_path = "./SENTINEL2_L2A_SAMPLE_STAC/items/20220630000000.json"
stac_api = pystac_client.stac_api_io.StacApiIO()
stac_dict = json.loads(stac_api.read_text(item_path))
item = stac_api.stac_object_from_dict(stac_dict)

ds_stac = odc.stac.load([item])
print(ds_stac)

> <xarray.Dataset> Size: 13MB
> Dimensions:      (y: 705, x: 935, time: 1)
> Coordinates:
>   * y            (y) float64 6kB 5.155e+06 5.155e+06 ... 5.148e+06 5.148e+06
>   * x            (x) float64 7kB 6.75e+05 6.75e+05 ... 6.843e+05 6.843e+05
>     spatial_ref  int32 4B 32632
>   * time         (time) datetime64[ns] 8B 2022-06-30
> Data variables:
>     B04          (time, y, x) float32 3MB 278.0 302.0 274.0 ... 306.0 236.0
>     B03          (time, y, x) float32 3MB 506.0 520.0 456.0 ... 378.0 367.0
>     B02          (time, y, x) float32 3MB 237.0 240.0 249.0 ... 246.0 212.0
>     B08          (time, y, x) float32 3MB 3.128e+03 2.958e+03 ... 1.854e+03
>     SCL          (time, y, x) float32 3MB 4.0 4.0 4.0 4.0 ... 4.0 4.0 4.0 4.0
```

#### Generate ZARR STAC from netCDF file

Convert a netCDF file to ZARR format and generate STAC items with additional metadata:

```python
import xarray as xr
from datetime import datetime, timezone
from raster2stac import Raster2STAC
import logging
import os
import numpy as np

rs2stac = Raster2STAC(
    data="S2_L2A_sample.nc",
    collection_id="R2S_TEST_COLLECTION",
    collection_url="https://10.8.244.74:8082/collections/",
    item_prefix="R2S_TEST",
    output_folder="S2_L2A_sample_ZARR",
    description="Test Collection",
    title="Raster2STAC Test Collection",
    keywords=["test", "stac", "collection"],
    providers=[
        {
            "url": "https://www.eurac.edu",
            "name": "Eurac Research",
            "roles": ["producer"],
        }
    ],
    stac_version="1.0.0",
    s3_upload=False,
    license="CC-BY-4.0",
    sci_citation="Test citation",
).generate_zarr_stac(item_id="S2_L2A_sample_ZARR")
```

### Case 2: create a Zarr based STAC Collection from a 5-dimensional dataset

1. Get sample netCDF files:
```bash
wget https://github.com/Open-EO/openeo-localprocessing-data/raw/refs/heads/main/sample_netcdf/sample_5D.nc
```

2. Call raster2stac:

```python
import xarray as xr
from raster2stac import Raster2STAC
import rioxarray

ds = xr.open_dataset("sample_5D.nc").rio.write_crs(4326,inplace=True)

rs2stac = Raster2STAC(
    data=ds,
    collection_id="DATA_5D",
    collection_url="https://stac.eurac.edu/collections/",
    item_prefix="R2S_TEST",
    output_folder="DATA_5D",
    description="Test Collection with 5 dimensional data",
    title="Raster2STAC Test Collection 5D",
    keywords=["test", "stac", "collection"],
    providers=[
        {
            "url": "https://www.eurac.edu",
            "name": "Eurac Research",
            "roles": ["producer"],
        }
    ],
    links= [{
        "rel": "license",
        "href": "https://cds.climate.copernicus.eu/api/v2/terms/static/licence-to-use-copernicus-products.pdf",
        "title": "License to use Copernicus Products"
    }],
    stac_version="1.0.0",
    s3_upload=False,
    license="proprietary",
    sci_doi='https://doi.org/10.24381/cds.622a565a',
    sci_citation= "Schimanke S., Ridal M., Le Moigne P., Berggren L., Undén P., Randriamampianina R., Andrea U., \
        Bazile E., Bertelsen A., Brousseau P., Dahlgren P., Edvinsson L., El Said A., Glinton M., Hopsch S., \
        Isaksson L., Mladek R., Olsson E., Verrelle A., Wang Z.Q., (2021): CERRA sub-daily regional reanalysis \
        data for Europe on single levels from 1984 to present. Copernicus Climate Change Service (C3S) Climate \
        Data Store (CDS), DOI: 10.24381/cds.622a565a (Accessed on 15-02-2024)"
).generate_zarr_stac(item_id="DATA_5D")
```

You can then load the 5D dataset using OpenEO:

```python
from openeo.local import LocalConnection

conn = LocalConnection("")
ds = conn.load_stac("DATA_5D/items/DATA_5D.json").execute()

> <xarray.DataArray (bands: 1, time: 216, level: 2, y: 96, x: 161, number: 25)> Size: 167MB
> dask.array<stack, shape=(1, 216, 2, 96, 161, 25), dtype=int8, chunksize=(1, 54, 1, 24, 81, 13), chunktype=numpy.ndarray>
> Coordinates:
>   * level        (level) int32 8B 500 850
>   * number       (number) int64 200B 0 1 2 3 4 5 6 7 ... 17 18 19 20 21 22 23 24
>     spatial_ref  int64 8B ...
>   * time         (time) datetime64[ns] 2kB 2016-01-01 2016-01-02 ... 2016-08-03
>   * x            (x) float64 1kB 5.084 5.151 5.218 5.285 ... 15.69 15.76 15.82
>   * y            (y) float64 768B 43.62 43.69 43.75 43.82 ... 49.86 49.93 50.0
>   * bands        (bands) object 8B 'z'
> Attributes:
>     CDI:          Climate Data Interface version 2.0.4 (https://mpimet.mpg.de...
>     CDO:          Climate Data Operators version 2.0.4 (https://mpimet.mpg.de...
>     Conventions:  CF-1.6
>     history:      Tue Feb 27 09:39:09 2024: cdo remapbil,/mnt/CEPH_PROJECTS/I...
```

#### Generate STAC from xarray Dataset (netCDF)

Use an existing xarray Dataset to generate STAC items in netCDF format:

```python
import xarray as xr
from raster2stac import Raster2STAC

ds = xr.open_dataset("S2_L2A_sample.nc")

rs2stac = Raster2STAC(
    data=ds,  # The xarray Dataset which will be converted
    collection_id="SENTINEL2_L2A_SAMPLE",  # The Collection id we want to set
    collection_url="https://stac.eurac.edu/collections/",  # The URL where the collection will be exposed
    output_folder="SENTINEL2_L2A_SAMPLE_STAC"
).generate_netcdf_stac()
```

#### Generate ZARR STAC from xarray Dataset

Use an existing xarray Dataset to generate STAC items in ZARR format:

```python
import xarray as xr
from datetime import datetime, timezone
from raster2stac import Raster2STAC
import logging
import os
import numpy as np

rs2stac = Raster2STAC(
    data=ds,
    collection_id="R2S_TEST_COLLECTION",
    collection_url="https://10.8.244.74:8082/collections/",
    item_prefix="R2S_TEST",
    output_folder="S2_L2A_sample_ZARR_dataset",
    description="Test Collection",
    title="Raster2STAC Test Collection",
    keywords=["test", "stac", "collection"],
    providers=[
        {
            "url": "https://www.eurac.edu",
            "name": "Eurac Research",
            "roles": ["producer"],
        }
    ],
    stac_version="1.0.0",
    s3_upload=False,
    license="CC-BY-4.0",
    sci_citation="Test citation",
).generate_zarr_stac(item_id="S2_L2A_sample_ZARR")
```

## Key Features

- Convert netCDF files to COG, netCDF, or ZARR formats
- Generate STAC-compliant items and collections
- Support for both file paths and xarray Dataset objects
- Process multiple netCDF files as a list
- Flexible metadata configuration
- Multiple output format options
- Easy data loading using pystac_client and odc.stac
- Integration with STAC APIs and clients

## Common Workflow

A typical workflow with raster2stac involves:

1. Convert your raster data to STAC-compliant assets using Raster2STAC
2. Generate STAC collection and items in your desired format (COG/netCDF/ZARR)
3. Publish or serve the STAC metadata
4. Load and use the data through STAC clients using pystac_client and odc.stac
5. Integrate with STAC catalogs and APIs for discovery and access

## License

This project is distributed with MIT license - see 'LICENSE' for details.
