Metadata-Version: 2.4
Name: spectoprep
Version: 1.0.0a0
Summary: A comprehensive spectral preprocessing toolkit for chemometrics
Author-email: Habeeb Babatunde <babatundehabeeb2@gmail.com>
Maintainer-email: Habeeb Babatunde <babatundehabeeb2@gmail.com>
License: MIT license
Project-URL: bugs, https://github.com/habeeb3579/spectoprep/issues
Project-URL: changelog, https://github.com/habeeb3579/spectoprep/blob/master/changelog.md
Project-URL: homepage, https://github.com/habeeb3579/spectoprep
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/x-rst
License-File: LICENSE
License-File: AUTHORS.rst
Requires-Dist: typer
Requires-Dist: bayesian-optimization
Requires-Dist: seaborn
Provides-Extra: dev
Requires-Dist: coverage; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: sphinx; extra == "dev"
Requires-Dist: myst-parser; extra == "dev"
Requires-Dist: furo; extra == "dev"
Requires-Dist: sphinx_rtd_theme; extra == "dev"
Requires-Dist: pydata-sphinx-theme; extra == "dev"
Requires-Dist: nbsphinx; extra == "dev"
Dynamic: license-file

SpectroPrep
===========

.. image:: https://img.shields.io/pypi/v/spectoprep.svg
    :target: https://pypi.python.org/pypi/spectoprep

.. image:: https://img.shields.io/travis/habeeb3579/spectoprep.svg
    :target: https://travis-ci.com/habeeb3579/spectoprep

.. image:: https://codecov.io/gh/habeeb3579/Spectoprep/graph/badge.svg?token=5EPSYE77K7 
    :target: https://codecov.io/gh/habeeb3579/Spectoprep

.. image:: https://anaconda.org/habeebest/spectoprep/badges/version.svg
    :target: https://anaconda.org/habeebest/spectoprep


.. image:: https://readthedocs.org/projects/spectoprep/badge/?version=latest
    :target: https://spectoprep.readthedocs.io/en/latest/?version=latest
    :alt: Documentation Status

Spectroscopy preprocessing using Bayesian Optimization

Overview
--------

SpectroPrep provides a toolkit for optimizing spectroscopic data preprocessing pipelines using Bayesian optimization. It automatically discovers the optimal combination of preprocessing techniques and their parameters to improve model performance for spectroscopic data analysis.

Features
--------

- **Pipeline Optimization**: Automate the discovery of optimal preprocessing pipelines using Bayesian optimization
- **Flexible Preprocessing**: Choose from multiple preprocessing techniques (MSC, SNV, Savitzky-Golay, etc.)
- **Cross-Validation Support**: Group-based cross-validation methods for robust evaluation
- **Configurable Pipeline Length**: Control maximum preprocessing steps and allowed combinations

Installation
------------

.. code-block:: bash

    pip install spectoprep

Quick Start
-----------

.. code-block:: python

    from spectroprep.pipeline.optimizer import PipelineOptimizer
    import numpy as np

    # Prepare your data
    X_train = np.array(...)  # Your spectral data matrix
    y_train = np.array(...)  # Your target values
    groups = np.array(...)   # Optional group labels for cross-validation

    # Initialize the optimizer
    optimizer = PipelineOptimizer(
        X_train=X_train,
        y_train=y_train,
        X_test=None,
        y_test=None,
        preprocessing_steps=['msc', 'savgol', 'detrend', 'scaler', 'snv',
                              'robust_scaler', 'emsc', 'meancn'],
        cv_method="group_shuffle_split",
        n_splits=3,
        random_state=21,
        groups=groups,
        max_pipeline_length=2,
        allowed_preprocess_combinations=[1, 2]
    )

    # Run Bayesian optimization to find the best pipeline
    best_params, best_pipeline = optimizer.bayesian_optimize(
        init_points=50,
        n_iter=1000
    )

    # Extract preprocessing steps without the final model
    custom_preprocessing = []
    for name, step in best_pipeline.steps[:-1]:
        custom_preprocessing.append((name, step))

    # Print optimization summary
    summary = optimizer.summarize_optimization()
    print(f"Best pipeline configuration: {summary['best_pipeline']}")
    print(f"Best RMSE: {summary['best_rmse']:.4f}")

    # Make predictions with the optimized pipeline
    predictions, rmse, r2 = optimizer.get_best_pipeline_predictions(best_pipeline)

Available Preprocessing Methods
-------------------------------

- **msc**: Multiplicative Scatter Correction
- **savgol**: Savitzky-Golay filtering
- **detrend**: Linear detrending
- **scaler**: Standard scaling
- **snv**: Standard Normal Variate
- **robust_scaler**: Robust scaling
- **emsc**: Extended Multiplicative Signal Correction
- **meancn**: Mean centering
- **pca**: Principal Component Analysis
- **select_k_best**: Feature selection

Documentation
-------------

For detailed documentation, visit `spectoprep.readthedocs.io <https://spectoprep.readthedocs.io>`_.

Contributing
------------

We welcome contributions! Please feel free to submit a Pull Request.

License
-------

This project is licensed under the MIT License - see the LICENSE file for details.

Credits
-------

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage

.. warning::

   This package is still under heavy development.
