Metadata-Version: 2.4
Name: jensenshannondivergence
Version: 0.1.0
Summary: Jensen-Shannon divergence estimation for tabular data using discriminator-based methods
Author-email: Alba Garrido López <alba.garrido.lopez@upm.es>
License: MIT License
        
        Copyright (c) 2026 Alba Garrido López
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Source, https://github.com/AlbaGarridoLopezz/jensenshannondivergence
Keywords: jensen-shannon,divergence,statistics,machine-learning,synthetic-data
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.26
Requires-Dist: pandas>=2.2
Requires-Dist: scipy>=1.11
Requires-Dist: scikit-learn>=1.5
Requires-Dist: scikit-optimize>=0.10
Requires-Dist: colorama>=0.4
Requires-Dist: packaging>=24
Provides-Extra: torch
Requires-Dist: torch>=2.2; extra == "torch"
Provides-Extra: ml
Requires-Dist: xgboost>=2.0; extra == "ml"
Provides-Extra: tabular
Requires-Dist: syndat>=0.13; extra == "tabular"
Requires-Dist: synthcity>=0.2.12; extra == "tabular"
Requires-Dist: tabpfn>=6.4.1; extra == "tabular"
Provides-Extra: viz
Requires-Dist: matplotlib>=3.8; extra == "viz"
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Provides-Extra: all
Requires-Dist: torch>=2.2; extra == "all"
Requires-Dist: xgboost>=2.0; extra == "all"
Requires-Dist: syndat>=0.13; extra == "all"
Requires-Dist: synthcity>=0.2.12; extra == "all"
Requires-Dist: tabpfn>=6.4.1; extra == "all"
Requires-Dist: matplotlib>=3.8; extra == "all"
Dynamic: license-file

# JensenShannonDivergence

Python package for Jensen-Shannon divergence estimation on tabular data.

The core API works with NumPy arrays, pandas DataFrames, PyTorch tensors, or numeric array-like values.

## Installation

### From GitHub

If the package is only published on GitHub, install it directly from the repository:

```bash
pip install "git+https://github.com/AlbaGarridoLopezz/jensenshannondivergence.git"
```

To install a specific branch or tag:

```bash
pip install "git+https://github.com/AlbaGarridoLopezz/jensenshannondivergence.git@main"
```

### From PyPI

After publishing the package to PyPI, users will be able to install it with:

```bash
pip install jensenshannondivergence
```

Optional extras can be installed with:

```bash
pip install "jensenshannondivergence[all]"
```

### Local Development

Clone the repository and install it in editable mode:

```bash
python -m venv venv
source venv/bin/activate
pip install -e .
```

If you want the optional extras:

```bash
pip install -e .[all]
```

## What Users Should Import

The library is *custom-first*: call it with your own real/reference samples and generated/synthetic samples.

### Simple API

`estimate_jensen_shannon` returns a `float` with the estimated Jensen-Shannon divergence.

```python
import numpy as np
from jensenshannondivergence import estimate_jensen_shannon

x_reference = np.random.normal(size=(1000, 10))
x_synthetic = np.random.normal(loc=0.2, size=(1000, 10))

js = estimate_jensen_shannon(
    x_reference,
    x_synthetic,
    discriminator_type="MLP",  # MLP, RF, XGBoost, LogReg, LogRegPol, TabPFN
    n_iter=30,                 # used by RF/XGBoost/LogReg optimizers
    seed=0,
)

print(js)
```

Use `return_result=True` if you need the full evaluator and output path:

```python
from jensenshannondivergence import estimate_jensen_shannon

result = estimate_jensen_shannon(
    x_reference,
    x_synthetic,
    discriminator_type="RF",
    m=500,
    l=250,
    n_iter=20,
    return_result=True,
)

print(result.evaluator.disc_js)
print(result.results_path)
```

### Predefined experiments (repository)

Predefined experiment loaders live under the `experiments/` folder and are intended
for repository-based development. The library package no longer exposes a
`use_predefined` runtime option. To run a predefined use case from the repo,
load the experiment data and call the library API with the returned tensors:

```python
# run from the repository root (so `experiments` is importable)
from experiments import data as exp_data
from jensenshannondivergence import estimate_jensen_shannon

# load tensors for a use case
x_r, x_s, dist_r, dist_s = exp_data.load_data('use_case_7', n=10, m=2000, l=2000, seed=0)

js = estimate_jensen_shannon(
    x_r,
    x_s,
    discriminator_type='MLP',
    m=2000,
    l=2000,
    seed=0,
)
```

If you prefer orchestration, use `main_experiments.py` which loads experiment
data and calls the library for you (see the `Experiments CLI` section above).

## CLI For Your Own Data

After installation, `jsd-estimate` estimates JS directly from two CSV files:

```bash
jsd-estimate --x-p real_data.csv --x-q gen_data.csv --discriminator MLP --epochs 100
```

Useful arguments:

- `--discriminator` / `--classifier`
- `--m`, `--l`
- `--n-iter`
- `--epochs`
- `--ratio-correction-mode`
- `--results-root`
- `--save-plots`

## Experiments CLI

`main_experiments.py` is only for predefined experiments and is intended for repository development runs.

List available use cases:

```bash
python main_experiments.py list
```

Train selected experiments:

```bash
python main_experiments.py train --discriminators MLP RF --experiments use_case_1 use_case_3 --n-iter 30
```

Run discriminator tests across classifiers:

```bash
python main_experiments.py test-discriminators --experiments use_case_1 --n-iter 20
```

Use `--models`, `--datasets-use-case-4`, and `--datasets-use-case-11` for the real-data use cases.

## Interactive Tutorial Notebook

Use the tutorial notebook for a minimal end-to-end run:

- `Tutorial.ipynb`

The tutorial saves run outputs to:

- `experiments/tutorials_outputs/`

If you edit code under `src/`, restart the notebook kernel before re-running cells.

## Experiments Path Convention

All read/write experiment paths are centralized under `experiments/`, including:

- `experiments/data/`
- `experiments/results_MLP/`
- `experiments/results_RF/`
- `experiments/results_XGBoost/`
- `experiments/results_LogReg/`
- `experiments/results_LogRegPol/`
- `experiments/results_TabPFN/`
- `experiments/results_discriminators/`
- `experiments/calibration_audit_results/`
- `experiments/tutorials_outputs/`

You can override this root with the environment variable `JSD_EXPERIMENTS_ROOT`.

Notes on results and plotting behavior: library functions do not produce plots by default — plotting is performed by the experiments scripts and notebooks. Experiment outputs (tables, CSVs, and optional plots) are written under the experiments root; control plot saving with the `save_plots` argument in the experiments CLI or by setting `save_plots=True` in the notebooks.

## Notes

- For best performance, run with GPU when available.
- Some baselines require optional dependencies (`syndat`, `synthcity`, `tabpfn`).
- If using TabPFN, make sure your PyTorch version is compatible.
