Metadata-Version: 2.1
Name: soilspecdata
Version: 0.0.2
Summary: Download and load soil spectral data
Home-page: https://github.com/franckalbinet/soilspecdata
Author: Franck Albinet
Author-email: franckalbinet@gmail.com
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev

# SoilSpecData


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

> A Python package for handling soil spectroscopy data, with a focus on
> the [Open Soil Spectral Library
> (OSSL)](https://explorer.soilspectroscopy.org/).

## Installation

``` sh
pip install soilspecdata
```

## Features

- Easy loading and handling of OSSL dataset
- Support for both VISNIR (Visible Near-Infrared) and MIR (Mid-Infrared)
  spectral data
- Flexible wavelength range filtering
- Convenient access to soil properties and metadata
- Automatic caching of downloaded data
- Get aligned spectra and target variable(s)
- *Further datasets to come …*

## Quick Start

``` python
# Import the package
from soilspecdata.datasets.ossl import get_ossl
```

Load the OSSL dataset:

``` python
ossl = get_ossl()
```

- Get MIR spectra (600-4000 cm⁻¹):

``` python
mir_data = ossl.get_mir(require_valid=True)
```

- Get VISNIR spectra with custom wavelength range:

``` python
visnir_data = ossl.get_visnir(wmin=500, wmax=1000, require_valid=True)
```

- Get soil properties (e.g., CEC):

``` python
properties = ossl.get_properties(['cec_usda.a723_cmolc.kg'], require_complete=True)
```

For more details on the OSSL dataset and its variables, see the [OSSL
documentation](https://soilspectroscopy.github.io/ossl-manual/database-description.html).

- Get metadata (e.g., geographical coordinates):

``` python
metadata = ossl.get_properties(['longitude.point_wgs84_dd', 'latitude.point_wgs84_dd'], require_complete=False)
```

- Or to get directly aligned spectra and target variable(s):

``` python
X, y, ids = ossl.get_aligned_data(
    spectra_data=mir_data,
    target_cols='cec_usda.a723_cmolc.kg'
)

X.shape, y.shape, ids.shape
```

    ((57062, 1701), (57062, 1), (57062,))

## Data Structure

The package returns spectra data in a structured format containing: -
Wavenumbers - Spectra measurements - Measurement type
(reflectance/absorbance) - Sample IDs

Properties and metadata are returned as pandas DataFrames indexed by
sample ID.

## Cache Management

By default, the OSSL dataset is cached in `~/.soilspecdata/`. To force a
fresh download:

``` python
ossl = get_ossl(force_download=True)
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

Apache2

## Citation

TBC
