Metadata-Version: 2.4
Name: gloss-opt
Version: 1.0.0
Summary: GLOSS: Global-Local-Unexplored Sampling Strategy for batch surrogate optimization in vast chemical search spaces
Author-email: zhangbc <zbc@ustc.edu.cn>
License: MIT
Project-URL: Homepage, https://github.com/zbc0315/gloss
Project-URL: Repository, https://github.com/zbc0315/gloss
Project-URL: Issues, https://github.com/zbc0315/gloss/issues
Keywords: bayesian-optimization,surrogate-models,autonomous-chemistry,batch-sampling,exploration-exploitation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Requires-Dist: scipy>=1.7
Requires-Dist: scikit-learn>=1.0
Provides-Extra: nn
Requires-Dist: torch>=1.9; extra == "nn"
Provides-Extra: ml
Requires-Dist: xgboost>=1.5; extra == "ml"
Requires-Dist: lightgbm>=3.3; extra == "ml"
Provides-Extra: all
Requires-Dist: torch>=1.9; extra == "all"
Requires-Dist: xgboost>=1.5; extra == "all"
Requires-Dist: lightgbm>=3.3; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Dynamic: license-file

# GLOSS

**Global–Local–Unexplored Sampling Strategy** — a multi-strategy batch
recommender for surrogate-based optimization in vast chemical search
spaces.

[![PyPI](https://img.shields.io/pypi/v/gloss-opt.svg?label=PyPI&color=blue)](https://pypi.org/project/gloss-opt/)
[![Python](https://img.shields.io/pypi/pyversions/gloss-opt.svg?logo=python&logoColor=white)](https://pypi.org/project/gloss-opt/)
[![License](https://img.shields.io/github/license/zbc0315/gloss.svg?color=green)](LICENSE)
[![Downloads](https://static.pepy.tech/badge/gloss-opt)](https://pepy.tech/project/gloss-opt)
[![GitHub stars](https://img.shields.io/github/stars/zbc0315/gloss.svg?style=social)](https://github.com/zbc0315/gloss/stargazers)
[![Last commit](https://img.shields.io/github/last-commit/zbc0315/gloss.svg)](https://github.com/zbc0315/gloss/commits/main)
[![Docs](https://readthedocs.org/projects/gloss/badge/?version=latest)](https://gloss.readthedocs.io/en/latest/)

---

## What it does

Standard batch Bayesian optimization (BO) picks all *q* candidates per
round by greedily maximizing a single acquisition function. When the
surrogate—fit on scarce data—has locked onto a secondary peak rather
than the global optimum, the whole batch is wasted.

GLOSS decomposes each *q*-point batch across three complementary
streams that share **one** surrogate:

| Stream | Role | Selection |
|---|---|---|
| **Global** | Exploitation | UCB acquisition `s·μ(x) + κ·σ(x)` |
| **Local** | Refinement | BallTree neighborhood around current best `x*` (top-*K* truncation, *K* = 500 by default; `O(K)` per round) |
| **Unexplored** | Exploration | Maximizes geometric distance to observed points; **uses no surrogate signal** |

The Unexplored stream is the operational answer to the overfitting
trap: it forces every round to deposit data in regions the surrogate
has not yet seen, so its blind spots get filled even when the μ/σ
predictions are unreliable.

---

## Install

```bash
pip install gloss-opt
```

The PyPI distribution is `gloss-opt` (the bare `gloss` name was
already taken on PyPI), but the Python **import** name is still
`gloss`:

```python
from gloss import GLOSS
```

Optional extras:

```bash
pip install "gloss-opt[nn]"   # + torch (for NN surrogate)
pip install "gloss-opt[ml]"   # + xgboost, lightgbm
pip install "gloss-opt[all]"  # everything above
```

Install from source:

```bash
git clone https://github.com/zbc0315/gloss.git
cd gloss
pip install -e ".[all]"
```

Python 3.9+ required.

---

## Quick start

```python
import numpy as np
from gloss import GLOSS

# A candidate pool of 10,000 points in 5 dimensions
candidates = np.random.rand(10_000, 5)

g = GLOSS(
    space={"candidates": candidates},
    direction="maximize",
    ratio={"global_best": 4, "local_best": 2, "unexplored": 2},
    ucb_kappa=2.0,
    diversity_radius=0.02,
)

# Bootstrap with a few initial measurements
X_obs = candidates[np.random.choice(len(candidates), 8, replace=False)]
y_obs = my_oracle(X_obs)                       # your evaluation here

# Round-by-round recommendation
for _ in range(20):
    batch = g.recommend(X_obs, y_obs, n_points=8)
    y_new = my_oracle(batch)
    X_obs = np.vstack([X_obs, batch])
    y_obs = np.concatenate([y_obs, y_new])
```

See `benchmarks/bench_main.py` for end-to-end runnable examples on
QM9, Buchwald–Hartwig and a virtual reaction surface.

---

## Reproducing the benchmark

```bash
git clone https://github.com/zbc0315/gloss.git
cd gloss
pip install -e ".[all]"
python -m benchmarks.bench_main --study all
```

The benchmark compares GLOSS against UCB-BO, BO(EI), GA and Random on
three chemistry datasets across 5 seeds × 20 rounds:

| Dataset | n | Source |
|---|---:|---|
| Buchwald–Hartwig | 3,955 | Experimental yields |
| QM9 HOMO–LUMO gap | 100,000 | DFT, 20 RDKit descriptors |
| Arrhenius-2D | 10,000 | Virtual reaction surface |

Headline numbers on QM9-100k (5/5 seeds, mean *t*₉₅):

| Algorithm | *t*₉₅ (rounds) | Reach 95% |
|---|---:|---:|
| **GLOSS (4:2:2)** | **7.2** | **5/5** |
| UCB-BO | 16.6 | 3/5 |
| BO(EI) | 18.4 | 2/5 |

→ **2.31× / 2.56×** speedup over the two BO variants.

---

## Documentation

* Algorithm details, design decisions, and a per-stream walkthrough are
  in the paper (link to be added on submission).
* Per-class API: `gloss/gloss.py` (top-level `GLOSS` class),
  `gloss/strategies/` (the three streams),
  `gloss/surrogate/` (RF / GP / NN backends).
* Benchmark scripts: `benchmarks/bench_*.py`.

---

## Citation

If you use GLOSS in your research, please cite:

```bibtex
@article{gloss2026,
  title  = {GLOSS: A Multi-Strategy Sampling Framework for Optimization in Vast Chemical Search Spaces},
  author = {Zhang, Baicheng and Zhang, Guoqing and Luo, Yi and Jiang, Jun and Zhu, Zhuoying},
  year   = {2026},
  note   = {Submitted}
}
```

---

## License

MIT. See [LICENSE](LICENSE).
