Metadata-Version: 2.3
Name: transcendent-multiclass-CDD__Wdis
Version: 1.1.1
Summary: Transcendent adaptation for multiclass problems
License: BSD 3-Clause License
Author: Luca Fabri
Author-email: luca.fabri1999@gmail.com
Requires-Python: >=3.10,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: contourpy (==1.3.2)
Requires-Dist: coverage (>=7.8.0,<8.0.0)
Requires-Dist: cycler (==0.12.1)
Requires-Dist: fonttools (==4.58.0)
Requires-Dist: joblib (==1.5.0)
Requires-Dist: kiwisolver (==1.4.8)
Requires-Dist: lightgbm (>=4.6.0,<5.0.0)
Requires-Dist: matplotlib (==3.10.3)
Requires-Dist: mypy (>=1.15.0,<2.0.0)
Requires-Dist: numpy (==2.2.5)
Requires-Dist: packaging (==25.0)
Requires-Dist: pandas (==2.2.3)
Requires-Dist: pillow (==11.2.1)
Requires-Dist: pyparsing (==3.2.3)
Requires-Dist: pytest (>=8.3.5,<9.0.0)
Requires-Dist: python-dateutil (==2.9.0.post0)
Requires-Dist: pytz (==2025.2)
Requires-Dist: ruff (>=0.11.6,<0.12.0)
Requires-Dist: scikit-learn (==1.6.1)
Requires-Dist: scipy (==1.15.3)
Requires-Dist: seaborn (==0.13.2)
Requires-Dist: six (==1.17.0)
Requires-Dist: termcolor (==3.1.0)
Requires-Dist: tesseract (==0.1.3)
Requires-Dist: threadpoolctl (==3.6.0)
Requires-Dist: tqdm (==4.67.1)
Requires-Dist: tzdata (==2025.2)
Requires-Dist: ujson (==5.10.0)
Requires-Dist: xgboost (>=3.0.1,<4.0.0)
Description-Content-Type: text/markdown

# Transcendent Multiclass

![CI status](https://github.com/malware-concept-drift-detection/transcendent-multiclass/actions/workflows/check.yml/badge.svg) 
![Version](https://img.shields.io/github/v/release/malware-concept-drift-detection/transcendent-multiclass?style=plastic)

This repository enables users to apply Transcendent-like concept drift detection to both binary and multiclass problems.

Modifications have been made specifically to the ICE (Inductive Conformal Evaluator) implementation, while the other solutions (i.e. TCE, CCE, etc.) are out of the scope.

This project adapts  [Transcendent](https://github.com/s2labres/transcendent-release/tree/main) for multiclass problems by implementing two *Nonconformity Measures* (NCM) for Random forest and LightGBM classifiers.

## Prerequisites

- *Setup* the train/test split directory, which should contains the following files:
    ```plaintext
    time_split/
    ├── X_train.pkl
    ├── X_test.pkl
    ├── X_proper_train.pkl
    ├── X_cal.pkl
    ├── y_train.pkl
    ├── y_test.pkl
    ├── y_proper_train.pkl
    └── y_cal.pkl
    ```

- *Make sure* to have a running and active version of [Docker](https://docs.docker.com/engine/install/).

## Usage:

1. *Clone* the repository and change directory:
    ```bash
    git clone git@github.com:w-disaster/transcendent-multiclass.git && cd transcendent-multiclass
    ```

2. *Configure* the env variables and *Run* Inductive Conformal Evaluator:

    ```bash
    PE_DATASET_NAME=<YOUR_PE_DATASET_NAME>
    SPLITTED_MPH_DATASET_PATH=<YOUR_PRE_SPLITTED_DATA>
    BEST_HYP_DIR=<YOUR_BEST_HYP_DIR> # Based on format produced by overfitting-analysis

    docker run -d \
    --name mph-feature-extraction-$PE_DATASET_NAME \
    -e BASE_DATASET_PATH=/usr/app/dataset/ \
    -e PE_DATASET_TYPE=${PE_DATASET_NAME}_mph \
    -e SPLITTED_MPH_DATASET_PATH=/usr/input_data/splitted_dataset/ \
    -e BEST_HYP_DIR=/usr/input_data/best_hyp/ \
    -e FEATURE_TYPE=dts \
    -v $BEST_HYP_DIR:/usr/input_data/best_hyp/ \
    -v $SPLITTED_MPH_DATASET_PATH:/usr/input_data/splitted_dataset/ \
    -v ./results_multiclass/:/usr/app/models/ \
    ghcr.io/malware-concept-drift-detection/transcendent-multiclass:main
    ```

    A `results_multiclass/` directory will be locally created containing the credibility ($p$-values) and confidence scores for both calibration and testing sets.

4. *Analysis* post ICE:

    Check whether novel families in the testing set produce smaller $p$-values, and thus can be discriminated from seen families.
