Metadata-Version: 2.4
Name: mlfastopt
Version: 0.0.9b4
Summary: ML Fast Opt - Advanced ensemble optimization system for LightGBM hyperparameter tuning
Author-email: GenX AI Lab <contact@genxai.cc>
License: GENX AI LAB COMMUNITY LICENSE AGREEMENT
        Version 1.0
        
        Copyright (c) 2025 GenX AI Lab. All Rights Reserved.
        
        1. GRANT OF LICENSE
        GenX AI Lab ("Licensor") hereby grants to you ("Licensee") a non-exclusive, non-transferable, revocable license to install and use the software "mlfastopt" (the "Software") for personal, educational, research, and internal business purposes, free of charge.
        
        2. RESTRICTIONS
        You may NOT:
        (a) Distribute, sub-license, rent, lease, or sell the Software or any derivative works thereof;
        (b) Modify, reverse engineer, decompile, or disassemble the Software, except to the extent that such activity is expressly permitted by applicable law notwithstanding this limitation;
        (c) Remove or alter any copyright, trademark, or other proprietary notices from the Software.
        
        3. OWNERSHIP
        The Software is licensed, not sold. GenX AI Lab retains all ownership, title, copyright, and other intellectual property rights in the Software. This license does not grant you any rights to use GenX AI Lab's trademarks or service marks.
        
        4. DISCLAIMER OF WARRANTY
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
        
        5. TERMINATION
        This license is effective until terminated. Your rights under this license will terminate automatically without notice from the Licensor if you fail to comply with any term(s) of this license. Upon termination, you shall cease all use of the Software and destroy all copies, full or partial, of the Software.
        
Project-URL: Homepage, https://github.com/example/mlfastopt
Project-URL: Documentation, https://github.com/example/mlfastopt/docs
Project-URL: Repository, https://github.com/example/mlfastopt
Project-URL: Issues, https://github.com/example/mlfastopt/issues
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ax-platform>=1.0.0
Requires-Dist: sqlalchemy<2.0.0
Requires-Dist: lightgbm<5.0.0,>=4.6.0
Requires-Dist: polars>=1.31.0
Requires-Dist: pandas<3.0.0,>=2.3.0
Requires-Dist: numpy>=2.3.1
Requires-Dist: scikit-learn<2.0.0,>=1.7.0
Requires-Dist: matplotlib>=3.10.3
Requires-Dist: joblib>=1.5.1
Requires-Dist: flask<4.0.0,>=3.1.1
Requires-Dist: plotly<7.0.0,>=6.2.0
Requires-Dist: seaborn>=0.13.2
Requires-Dist: pyarrow>=20.0.0
Requires-Dist: keyring>=25.5.0
Requires-Dist: build>=1.2.2.post1
Requires-Dist: PyYAML>=6.0.2
Requires-Dist: gcsfs>=2025.12.0
Requires-Dist: fastparquet>=2025.12.0
Requires-Dist: openpyxl>=3.1.5
Requires-Dist: xgboost>=3.1.2
Requires-Dist: shap>=0.50.0
Provides-Extra: dev
Requires-Dist: pytest>=9.0.2; extra == "dev"
Requires-Dist: pytest-cov>=7.0.0; extra == "dev"
Requires-Dist: coverage>=7.13.0; extra == "dev"
Requires-Dist: black>=25.12.0; extra == "dev"
Requires-Dist: flake8>=7.3.0; extra == "dev"
Requires-Dist: mypy>=1.19.1; extra == "dev"
Dynamic: license-file

# MLFastOpt

[![PyPI version](https://badge.fury.io/py/mlfastopt.svg)](https://badge.fury.io/py/mlfastopt)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

MLFastOpt is a high-speed ensemble optimization system for Bayesian hyperparameter tuning of **LightGBM, XGBoost, and Random Forest models**.

## Features

- 🚀 **Fast Optimization**: Advanced Bayesian optimization algorithms (Sobol + BoTorch).
- 🧩 **Multi-Model Support**: Tune LightGBM, XGBoost, or Random Forest ensembles.
- ⚙️ **Simple Config**: Hierarchical JSON configuration and YAML/Python search spaces.
- 📊 **Rich Analytics**: Built-in web dashboards and visualization tools.

### Prerequisites

- Python 3.9+
- **macOS Users**: You must install `openmp` for LightGBM/XGBoost to work:
  ```bash
  brew install libomp
  ```

## Installation

1.  **Activate Virtual Environment**:
    ```bash
    source .venv/bin/activate
    # OR if you haven't created one yet:
    # python3.12 -m venv .venv && source .venv/bin/activate
    ```

2.  **Install Package**:
    ```bash
    pip install -e .[dev]
    ```

## Quick Start (End Users)

If you installed the package via `pip install mlfastopt`, follow these steps:

1.  **Create Configuration Files**:
    You need a `config.json` and a hyperparameter space file (e.g., `hyperparameters.yaml`).
    
    *config.json*:
    ```json
    {
      "data": { "path": "train.parquet", "label_column": "target", "features": "features.yaml" },
      "model": { "type": "xgboost", "hyperparameter_path": "config/hyperparameters/xgboost.yaml" },
      "training": { "metric": "f1", "total_trials": 20 },
      "output": { "dir": "outputs" }
    }
    ```

2.  **Run Optimization**:
    ```bash
    export OMP_NUM_THREADS=1
    mlfastopt-optimize --config config.json
    ```

## Quick Start (Developers)

**Prerequisite**: Input data must be preprocessed and numerical. Handle all categorical encoding (e.g., one-hot, label encoding) before using MLFastOpt (except for LightGBM/XGBoost which have some categorical support).

### 1. Setup
Create the required directory structure:
```bash
mkdir -p config/hyperparameters data
```

### 2. Define Parameter Space
We recommend using YAML for parameter spaces. Create `config/hyperparameters/my_space.yaml`:

```yaml
parameters:
  - name: learning_rate
    type: range
    bounds: [0.01, 0.3]
    value_type: float
    log_scale: true

  - name: max_depth
    type: range
    bounds: [3, 10]
    value_type: int
```

### 3. Configure
Create `my_config.json` using the nested structure:

```json
{
  "data": {
    "path": "data/your_dataset.parquet",
    "label_column": "target",
    "features": ["feature1", "feature2"],
    "class_weight": { "0": 1, "1": 5 },
    "under_sample_majority_ratio": 1.0
  },
  "model": {
    "type": "lightgbm",
    "hyperparameter_path": "config/hyperparameters/my_space.yaml",
    "ensemble_size": 5
  },
  "training": {
    "total_trials": 20,
    "sobol_trials": 5,
    "metric": "soft_recall",
    "parallel": true,
    "n_jobs": -1
  },
  "output": {
    "dir": "outputs/runs"
  }
}
```

### 4. Run
Execute optimization (ensure single-threading for LightGBM/XGBoost to avoid deadlocks):

```bash
OMP_NUM_THREADS=1 python -m mlfastopt.cli --config my_config.json
```

## Configuration Reference

### Data Section (`data`)
| Parameter | Description | Default |
|-----------|-------------|---------|
| `path` | Path to dataset (CSV/Parquet). | Required |
| `label_column` | Name of target column. | Required |
| `features` | List of features or path to YAML file. | Required |
| `class_weight` | Dictionary of class weights (e.g., `{"0": 1, "1": 10}`). | `None` |

### Model Section (`model`)
| Parameter | Description | Default |
|-----------|-------------|---------|
| `type` | Model type: `lightgbm`, `xgboost`, `random_forest`. | `lightgbm` |
| `hyperparameter_path` | Path to parameter space file. | Required |
| `ensemble_size` | Models per ensemble. | `1` |

### Training Section (`training`)
| Parameter | Description | Default |
|-----------|-------------|---------|
| `total_trials` | Total optimization trials. | `20` |
| `metric` | Metric to maximize (`soft_recall`, `soft_f1_score`, etc). | `soft_recall` |
| `parallel` | Enable parallel training of ensemble members. | `false` |

## Outputs

Results are saved to `outputs/`:
- **`runs/`**: Detailed logs and models for each run.
- **`best_trials/`**: JSON configurations of the best performing trials.
- **`visualizations/`**: Generated plots.
