Metadata-Version: 2.4
Name: sdufseval
Version: 1.2.0
Summary: Evaluation and Benchmark Tool for Feature Selection
Project-URL: Homepage, https://github.com/mrajabinasab/FSEVAL
Project-URL: Bug Tracker, https://github.com/mrajabinasab/FSEVAL/issues
Author-email: Muhammad Rajabinasab <muhammad.rajabinasab@outlook.com>
License: MIT License
Keywords: benchmark,evaluation,feature-importance,feature-selection,machine-learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: clustpy>=0.0.2
Requires-Dist: fsdem>=1.0.4
Requires-Dist: numpy>=1.26.4
Requires-Dist: pandas>=2.3.0
Requires-Dist: pcametric>=1.0.5
Requires-Dist: scikit-learn>=1.7.0
Requires-Dist: scipy>=1.16.0
Description-Content-Type: text/markdown

# FSEval – Feature Selection Evaluation Suite

**FSEval** is a lightweight, modular Python library designed to **benchmark feature selection and feature ranking methods** across multiple datasets using both **supervised** and **unsupervised** downstream evaluation protocols.

It helps researchers and practitioners answer the question:

> "Which feature selection method actually works best for my type of data and task?"

FSEval automates:

- Repeated training & evaluation at different feature subset sizes
- Stochastic method averaging
- Result persistence & incremental updates
- Support for both classification and clustering-based evaluation

## 📦 Dependencies and Requirements

FSEval requires:

- `python>=3.8`
- `numpy`
- `pandas`
- `scikit-learn`
- `scipy`
- `clustpy` (only needed for `unsupervised_clustering_accuracy`)
- `pcametric` (only needed for `AAD`)
  
## 💡 Installation
You can just download the source code and import fseval, or you can install it using pip:

```bash
pip install sdufseval
```

## 🚀 Quick Example

```python
from sdufseval import FSEVAL
import numpy as np
from sklearn.neighbors import NearestNeighbors

def snn_consistency_k5(X_orig, X_sub, y):
    """
    Calculates the average proportion of shared nearest neighbors (k=5) 
    between the original space and the feature-selected subspace.
    """
    k = 5
    k = min(k, X_orig.shape[0] - 1)
    
    def get_nn_indices(data, n_neighbors):
        nbrs = NearestNeighbors(n_neighbors=n_neighbors + 1, algorithm='auto').fit(data)
        _, indices = nbrs.kneighbors(data)
        return indices[:, 1:]

    nn_orig = get_nn_indices(X_orig, k)
    nn_sub = get_nn_indices(X_sub, k)
    
    intersections = [len(np.intersect1d(nn_orig[i], nn_sub[i])) for i in range(len(nn_orig))]
    return np.mean(intersections) / k

if __name__ == "__main__":

    DATASETS_TO_RUN = ['colon', 'leukemia', 'prostate_GE']

    evaluator = FSEVAL(
        output_dir="benchmark_results", 
        avg_steps=5,
        eval_type=["supervised", "unsupervised", "model_agnostic", "custom"],
        custom_metrics={"SNN_K5": snn_consistency_k5}
    )

    methods_list = [
        {
            'name': 'Random',
            'type': 'unsupervised', 
            'stochastic': True, 
            'func': evaluator.random_baseline
        },
        {
            'name': 'Variance_Baseline',
            'type': 'unsupervised', 
            'stochastic': False, 
            'func': lambda X: np.var(X, axis=0)
        }
    ]
    
    print(">>> Starting Integrated Evaluation (Global & Local metrics)...")
    evaluator.run(DATASETS_TO_RUN, methods_list)

    print("\n>>> Starting Scalability Analysis...")
    evaluator.timer(
        methods=methods_list, 
        vary_param='both', 
        time_limit=3600 
    )
```

## Data Loading

load_dataset(dataset_name, data_dir="datasets") supports:
- Single .mat file with keys 'X' and 'Y'
- Two CSV files: {name}_X.csv and {name}_y.csv

## 📚 API Reference

### 🛠️ `FSEval(output_dir="results", cv=5, avg_steps=10, eval_type=["supervised", "unsupervised", "model_agnostic"], metrics=None, experiments=None)`

Initializes the evalutation and benchmark object.

| Parameter | Default | Description |
| :--- | :--- | :--- |
| **`output_dir`** | results | Folder where CSV result files are saved. |
| **`cv`** | 5 | Cross-validation folds (supervised only). |
| **`avg_steps`** | 10 | Number of repetitions for stochastic methods.|
| **`supervised_iter`** | 5 | Number of classifier's runs with different random seeds.|
| **`unsupervised_iter`** | 10 | Number of clustering runs with different random seeds.|
| **`eval_type`** | ["supervised", "unsupervised", "model_agnostic"] | "supervised", "unsupervised", "model_agnostic", or "custom" to enable inclusion of custom user-defined metrics. |
| **`metrics`** | ["CLSACC", "NMI", "ACC", "AUC", "AAD"] | Evaluation metrics to calculate. |
| **`custom_metrics`** | {} | User-defined custom evaluation metrics. |
| **`experiments`** | ["10Percent", "100Percent"] | Which feature ratio grids to evaluate. |
| **`save_all`** | False | Save the results of all runs of the stochastic methods separately. |

### ⚙️ `run(datasets, methods, classifier=None)`

Initializes the evalutation and benchmark object.

| Argument | Type | Description |
| :--- | :--- | :--- |
| **`datasets`** | List[str] | Dataset names loadable via load_dataset(). |
| **`methods`** | List[dict] | "[{""name"": str, ""func"": callable, ""stochastic"": bool}, ...]" |
| **`classifier`** | sklearn classifier | Classifier for supervised eval (default: RandomForestClassifier) |

### ⚙️ `timer(methods, vary_param='features', time_limit=3600)`

Runs a runtime analysis on the methods.

| Argument | Type | Description |
| :--- | :--- | :--- |
| **`methods`** | List[dict] | "[{""name"": str, ""func"": callable, ""stochastic"": bool}, ...]" |
| **`vary_param`** | ["CLSACC", "NMI", "ACC", "AUC"] | "features", "instances", or "both". |
| **`time_limit`** | 3600 | Terminate the method after reecording first time it exceeds this limit. |

#  Dashboard

There is a Feature Selection Evaluation Dashboard based on the benchmarks provided by FSEVAL, available on:

https://fseval.imada.sdu.dk/

The dashboard offers a collection of useful analytic tools to provide comprehensive and comparative insights into the performance of your feature selection method(s). You can also download the dashboard source code from the GitHub repository, which is included in the dashboard folder and use it locally.

#  Citation

If you use FSEVAL in your research, please cite the original paper:

```
CITATION WILL BE PROVIDED UPON PUBLICATION.
```
