Metadata-Version: 2.4
Name: fcvopt
Version: 0.5.2
Summary: Fractional K-fold cross-validation for hyperparameter optimization
Project-URL: Homepage, https://github.com/syerramilli/fcvopt
Project-URL: Documentation, https://syerramilli.github.io/fcvopt/
Project-URL: Repository, https://github.com/syerramilli/fcvopt.git
Project-URL: Issues, https://github.com/syerramilli/fcvopt/issues
Author: Suraj Yerramilli, Daniel W. Apley
Maintainer-email: Suraj Yerramilli <surajyerramilli@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: automl,bayesian-optimization,cross-validation,gaussian-processes,hyperparameter-optimization,machine-learning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Requires-Dist: botorch>=0.12
Requires-Dist: configspace<2.0,>=1.0
Requires-Dist: gpytorch>=1.14
Requires-Dist: joblib>=1.3
Requires-Dist: mlflow<3.6,>=3.0
Requires-Dist: numpy>=2.0
Requires-Dist: pandas>=2.2.2
Requires-Dist: scikit-learn>=1.4.2
Requires-Dist: scipy>=1.12
Requires-Dist: skorch>=0.15
Requires-Dist: torch>=2.3.1
Provides-Extra: dev
Requires-Dist: black; extra == 'dev'
Requires-Dist: flake8; extra == 'dev'
Requires-Dist: isort; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: ipykernel; extra == 'docs'
Requires-Dist: nbsphinx; extra == 'docs'
Requires-Dist: sphinx-autodoc-typehints; extra == 'docs'
Requires-Dist: sphinx-rtd-theme; extra == 'docs'
Requires-Dist: sphinx>=4.0; extra == 'docs'
Requires-Dist: sphinxcontrib-napoleon; extra == 'docs'
Provides-Extra: experiments
Requires-Dist: lightgbm>=4.0; extra == 'experiments'
Requires-Dist: matplotlib>=3.7.0; extra == 'experiments'
Requires-Dist: optuna<4.0.0,>=3.6.0; extra == 'experiments'
Requires-Dist: seaborn>=0.12.2; extra == 'experiments'
Requires-Dist: smac<3.0,>=2.0; extra == 'experiments'
Requires-Dist: xgboost>=2.0; extra == 'experiments'
Description-Content-Type: text/markdown

# fcvopt: Fractional cross-validation for hyperparameter optimization

FCVOpt is a Python package for hyperparameter optimization via Fractional Cross-Validation. It implements the methodology from ["Fractional cross-validation for optimizing hyperparameters of supervised learning algorithms"](https://doi.org/10.1080/00401706.2025.2515926) using hierarchical Gaussian processes to efficiently optimize ML models by evaluating only a fraction of CV folds.

K-fold cross-validation is more robust than holdout validation, but requires fitting K models per hyperparameter configuration—making it expensive inside an optimization loop. FCVOpt sidesteps this by modeling the correlation structure of fold-wise losses across the hyperparameter space with a hierarchical GP, so that most configurations need only a single fold evaluated.

The documentation is available at [https://syerramilli.github.io/fcvopt/](https://syerramilli.github.io/fcvopt/).

## Features

* Fractional CV optimization via hierarchical Gaussian processes, with support for repeated K-fold cross-validation
* Standard Bayesian optimization with holdout loss, available for both hyperparameter tuning and general black-box optimization
* Fold selection via variance reduction, which chooses the most informative fold to evaluate at each step
* MLflow integration for experiment tracking and model checkpointing
* Acquisition functions: Knowledge Gradient and Lower Confidence Bound
* Works with scikit-learn estimators, XGBoost, and neural networks (via PyTorch-Skorch)

## Installation

```bash
pip install fcvopt
```

## Quick Start

```python
from fcvopt.optimizers import FCVOpt
from fcvopt.crossvalidation import SklearnCVObj
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import zero_one_loss
from fcvopt.configspace import ConfigurationSpace
from ConfigSpace import Integer, Float


# Define the CV objective
cv_obj = SklearnCVObj(
   estimator=RandomForestClassifier(),
   X=X, y=y,
   loss_metric=zero_one_loss,
   task='binary-classification',
   n_splits=5,
   rng_seed=42
)

# Define the hyperparameter search space
config = ConfigurationSpace()
config.add([
   Integer('n_estimators', bounds=(10, 1000), log=True),
   Integer('max_depth', bounds=(1, 12), log=True),
   Float('max_features', bounds=(0.1, 1), log=True),
])
config.generate_indices()

# Set up the optimizer
optimizer = FCVOpt(
   obj=cv_obj.cvloss,
   n_folds=cv_obj.cv.get_n_splits(),
   config=config,
   acq_function='LCB',           # 'KG' tends to work better but is slower
   fold_selection_criterion='variance_reduction',
   tracking_dir='./hpt_opt_runs/',
   experiment_name='rf_hpt'
)

# Run 50 trials, using 10 random initializations before switching to acquisition
best_conf = optimizer.optimize(n_trials=50, n_init=10)
optimizer.end_run()
```

## Research

FCVOpt implements the algorithm described in:

> "Fractional cross-validation for optimizing hyperparameters of supervised learning algorithms"
> Suraj Yerramilli and Daniel W. Apley
> *Technometrics* (2025)
> DOI: [10.1080/00401706.2025.2515926](https://doi.org/10.1080/00401706.2025.2515926)

## Citing

If you use this code in your research, please cite the following paper:

```
@article{yerramilli2025fractional,
    author = {Suraj Yerramilli and Daniel W. Apley},
    title = {Fractional Cross-Validation for Optimizing Hyperparameters of Supervised Learning Algorithms},
    journal = {Technometrics},
    year = {2025},
    doi = {10.1080/00401706.2025.2515926},
}
```
