Metadata-Version: 2.3
Name: prod-fs
Version: 1.0.5
Summary: ProD: A visualizable filter-feature selection method based on prodding the class probability densities for overlapping
Author: RenZhen95
Author-email: RenZhen95 <j-liaw@hotmail.com>
Requires-Dist: joblib>=1.5.3
Requires-Dist: matplotlib>=3.10.8
Requires-Dist: numpy<2.0.0
Requires-Dist: scipy>=1.16.3
Requires-Python: >=3.11
Description-Content-Type: text/markdown

y<p align="center">
  <img src="https://github.com/RenZhen95/prod-fs/blob/main/docs/artwork/logo.svg" width="300">
</p>

**ProD**, a visualizable filter-feature selection method based on "prodding" the class <ins>Pro</ins>bability <ins>D</ins>ensities for overlapping.

## Install
ProD can be installed from PyPI:
<pre>
pip install prod-fs
</pre>

## Example
```python
from prodfs import ProD

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification

# Create random classification dataset
X, y = make_classification(
    n_samples=300, n_features=50, n_classes=3, n_informative=5,
    shuffle=False
)

# Initialize ProD object
prodRanker = ProD()

# Carry out feature selection
prodRanker.fit(X, y)

# Get top 10 features
top10Features = prodRanker.get_topnFeatures(10)

# Visualize the top feature's ability to segregate PDEs
fig, axs = plt.subplots(1, 2, sharey=True)

# Top ranked feature
prodRanker.plot_overlapAreas(top10Features[0], legend="intersection", _ax=axs[0])
axs[0].set_title("Most relevant feature", loc="left")

# Last ranked feature
prodRanker.plot_overlapAreas(49, legend="intersection", _ax=axs[1])
axs[1].set_title("Least relevant feature", loc="left")

axs[0].set_ylabel(r"Probability Density, $\hat{P}$")
for i in range(2):
    axs[i].set_xlim(-0.5, 1.5)
    axs[i].set_xticks(np.arange(-0.5, 2.0, 0.5))
```
<p align="center">
  <img src="https://github.com/RenZhen95/prod-fs/blob/main/docs/artwork/example_plot.svg" width="550">
</p>

Check out the notebooks provided as tutorials and examples of some specific use cases.

## Citation
For now, cite the followinng abstract
> J.C. Liaw, F. Geu Flores. A novel univariate feature selection filter-measure based on the reduction of class overlapping. 94th Annual Meeting of the International Association of Applied Mathematics and Mechanics - GAMM, Magdeburg, Deutschland, 18.-22. March 2024, Oral Presentation S25.01-4

Available at <a href="https://jahrestagung.gamm.org/wp-content/uploads/2024/03/BookOfAbstracts-2.pdf#page=365" target="_blank">Book of Abstracts of the 94th Annual Meeting of the International Association of Applied Mathematics and Mechanics, p363</a>

The other feature selection methods that were compared to in our paper is as listed below:
1. LH-RELIEF: Feature weight estimation for gene selection: a local hyperlinear learning approach
DOI: https://doi.org/10.1186/1471-2105-15-70

2. I-RELIEF: Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications
DOI: https://doi.org/10.1109/TPAMI.2007.1093

3. RELIEF-F: Estimating attributes: Analysis and extensions of RELIEF
DOI: https://doi.org/10.1007/3-540-57868-4_57

4. MultiSURF: Benchmarking relief-based feature selection methods for bioinformatics data mining
DOI: https://doi.org/10.1016/j.jbi.2018.07.015

5. Random Forests
DOI: https://doi.org/10.1023/A:1010933404324

6. ANOVA F-statistic: Statistical Methods for Research Workers

7. Mutual Information: Estimating mutual information
DOI: https://doi.org/10.1103/PhysRevE.69.066138
