Metadata-Version: 2.4
Name: spyky
Version: 1.0.0
Summary: Package to remove cosmic spikes from Raman Spectra.
Author-email: Albert Lenk <alenk@duck.com>
License-Expression: MIT
Keywords: Raman Spectroscopy,cosmic spikes,spc,chemometrics
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scikit-learn
Requires-Dist: spc-io
Requires-Dist: scipy
Dynamic: license-file

# Spyky
Spyky incorperates the removal of cosmic spikes from Raman Spectra, as described by Whitaker & Hayes [[1]](#references), into a python package compatible with sklearn pipelines and parameter optimization. 

## Reading .spc files
Spyky provides the ability to read several .spc files stored in one location, using the [spc-io](https://github.com/h2020charisma/spc-io) package. Currently ```spyky``` only supports ```.spc``` files with a global X and single Y array.

```read_spc()``` returns: the spectra in an array of shape (n_files, n_wavelengths), meaning one row is one spectrum; the wavelengths (note that all ```.spc``` files must have the same wavelengths); the names of the read files.

```python
>>> from spyky.reader import read_spc

>>> path = r"./spectra_bin"
>>> spectra, wavelength, names = read_spc(path)

>>> print(spectra)
[[  684.   721.   776. ... 22819. 22517. 22036.]
 [  667.   724.   770. ... 22575. 22275. 21819.]
 [  676.   726.   775. ... 22618. 22346. 21851.]]

>>> print(wavelength)
[ 32.1   34.4   36.6   ...   3286.9   3287.8   3288.7 ]

>>> print(names)
['example_file_1.spc', 'example_file_2.spc', 'test_file_1.spc']
```
By specifying ```pattern``` you can filter which files to read. By default all ```.spc``` files in the specified path are read. The expressions are matched by [fnmatch](https://docs.python.org/3/library/fnmatch.html).

```python
>>> path = r"./spectra_bin"
>>> s, w, names = read_spc(path, pattern="example*.spc")
>>> print(names)
['example_file_1.spc', 'example_file_2.spc']

>>> s, w, names = read_spc(path, pattern="example*1*.spc")
>>> print(names)
['example_file_1.spc']
```
The is also the option to export the read files as a ```.csv``` by specifying ```export_to```. The header of the ```.csv``` file will contain the wavelength. 

```python 
>>> s, w, names = read_spc(path, export_to=r"./spectra.csv")
>>> print(names)
['example_file_1.spc']
```

## Spike Removal

The class ```DeSpike``` is written so that it seamlessly integrates into sklearn preprocessing piplines and is compatible with hyperparameter optimization like ```GridSearchCV```. Therefore ```.fit()``` and ```.transform()``` methods are implemented. Each take the spectra as an input. First use ```.fit()```  to calculate the modified z-scores, then use ```.transform()``` to perform the correction, as explained in [[1]](#references).

```python
>>> from spyky.spikes import DeSpike

>>> spiky = Despike(window=5, threshold=6)
>>> spiky.fit(spectra)

>>> despiked = spiky.transform(spectra)
>>> print(despiked)
[[  802.75   721.     776.   ... 22819.   22517.   22982.  ]
 [  811.     731.     783.   ... 22947.   22662.   23119.6 ]
 [  796.5    724.     770.   ... 22575.   22275.   22719.6 ]]
```

In a pipeline this might look like:
```python
>>> from sklearn.pipeline import make_pipeline
>>> from spyky.reader import read_spc
>>> from spyky.spikes import DeSpike

>>> s, w, n = read_spc(r"/home/arle/MSC/Code/spectra_bin/")

>>> pipe = make_pipeline(DeSpike(window=5, threshold=6))
>>> pipe.fit(s)
>>> corrected = pipe.transform(s)
>>> print(corrected)
[[  802.75   721.     776.   ... 22819.   22517.   22982.  ]
 [  811.     731.     783.   ... 22947.   22662.   23119.6 ]
 [  796.5    724.     770.   ... 22575.   22275.   22719.6 ]]

```
Use ```"despike__window"``` and ```"despike__threshold``` to test different values through ```param_grid``` in ```GridSearchCV```

Sometimes it can happen that the algorithm fasly identifies steep sections of the normal spectra as spikes. If this happens you can use the ```ignore``` and ```ignore_ref``` to supply an array containing the wavelengths you want to be ignored. The index of the wavenumber array will be used, if you do not supply ```ignore_ref```. Please note your input to ```ignore``` must match that of ```ignore_ref``` this is easiest to achieve through the use of a mask. Below you can see an example. 

```python
wcut = (w > 500) & (w < 1000)
spiky = DeSpike(threshold=3.3, ignore=w[wcut], ignore_ref=w)
```


# References
[1] D. A. Whitaker and K. Hayes, "A simple algorithm for despiking Raman spectra," Chemometrics and Intelligent Laboratory Systems, vol. 179, pp. 82-84, Aug. 2018, doi: 10.1016/j.chemolab.2018.06.009.
