Metadata-Version: 2.4
Name: honeybee-ml
Version: 0.2.1
Summary: A Scalable Modular Framework for Multimodal AI in Oncology
Author-email: Aakash Tripathi <aakash.tripathi@moffitt.org>, Lab Rasool <ghulam.rasool@moffitt.org>
Maintainer-email: Aakash Tripathi <aakash.tripathi@moffitt.org>
Project-URL: Homepage, https://github.com/lab-rasool/HoneyBee
Project-URL: Documentation, https://lab-rasool.github.io/HoneyBee/
Project-URL: Repository, https://github.com/lab-rasool/HoneyBee
Project-URL: Bug Tracker, https://github.com/lab-rasool/HoneyBee/issues
Project-URL: Paper, https://www.nature.com/articles/s41746-025-02003-4
Keywords: multimodal AI,oncology,cancer research,medical imaging,clinical NLP,machine learning,pathology,biomedical,healthcare
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Healthcare Industry
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: torch>=2.0.0
Requires-Dist: torchvision>=0.15.0
Requires-Dist: torchaudio>=2.0.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: accelerate>=0.20.0
Requires-Dist: bitsandbytes>=0.40.0
Requires-Dist: scikit-image>=0.19.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: matplotlib>=3.3.0
Requires-Dist: imageio>=2.9.0
Requires-Dist: tqdm>=4.60.0
Requires-Dist: Pillow>=9.0.0
Requires-Dist: opencv-python>=4.5.0
Requires-Dist: huggingface_hub>=0.16.0
Provides-Extra: clinical
Requires-Dist: unstructured[image,pdf]>=0.10.0; extra == "clinical"
Requires-Dist: pymupdf>=1.23.0; extra == "clinical"
Requires-Dist: langchain-text-splitters>=0.3.0; extra == "clinical"
Requires-Dist: pdf2image>=1.16.0; extra == "clinical"
Requires-Dist: PyPDF2>=3.0.0; extra == "clinical"
Requires-Dist: scispacy>=0.5.0; extra == "clinical"
Requires-Dist: spacy>=3.5.0; extra == "clinical"
Requires-Dist: fhir.resources>=7.0.0; extra == "clinical"
Requires-Dist: hl7apy>=1.3.0; extra == "clinical"
Requires-Dist: requests>=2.28.0; extra == "clinical"
Requires-Dist: dateparser>=1.1.0; extra == "clinical"
Requires-Dist: litellm>=1.0.0; extra == "clinical"
Requires-Dist: pytesseract>=0.3.8; extra == "clinical"
Provides-Extra: clinical-full
Requires-Dist: honeybee-ml[clinical]; extra == "clinical-full"
Requires-Dist: medspacy>=1.0.0; extra == "clinical-full"
Requires-Dist: medcat>=1.9.0; extra == "clinical-full"
Provides-Extra: pathology
Requires-Dist: openslide-python>=1.2.0; extra == "pathology"
Requires-Dist: colour-science>=0.4.0; extra == "pathology"
Requires-Dist: albumentations>=1.3.0; extra == "pathology"
Requires-Dist: cucim>=23.0.0; extra == "pathology"
Requires-Dist: umap-learn>=0.5.0; extra == "pathology"
Requires-Dist: scikit-learn>=1.0.0; extra == "pathology"
Provides-Extra: radiology
Requires-Dist: SimpleITK>=2.2.0; extra == "radiology"
Requires-Dist: monai>=1.0.0; extra == "radiology"
Requires-Dist: torchio>=0.19.0; extra == "radiology"
Requires-Dist: pydicom>=2.3.0; extra == "radiology"
Requires-Dist: nibabel>=5.0.0; extra == "radiology"
Requires-Dist: lungmask>=0.2.0; extra == "radiology"
Requires-Dist: TotalSegmentator>=2.0.0; extra == "radiology"
Requires-Dist: dipy>=1.7.0; extra == "radiology"
Requires-Dist: intensity-normalization>=2.0.0; extra == "radiology"
Requires-Dist: neuroCombat>=0.2.0; extra == "radiology"
Requires-Dist: nnunetv2>=2.0; extra == "radiology"
Provides-Extra: molecular
Requires-Dist: pyarrow>=10.0.0; extra == "molecular"
Requires-Dist: fastparquet>=2023.0.0; extra == "molecular"
Provides-Extra: database
Requires-Dist: pymongo>=4.0.0; extra == "database"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: visualization
Requires-Dist: ipykernel>=6.0.0; extra == "visualization"
Requires-Dist: ipywidgets>=8.0.0; extra == "visualization"
Provides-Extra: models
Requires-Dist: timm>=0.9.0; extra == "models"
Requires-Dist: onnxruntime>=1.15.0; extra == "models"
Requires-Dist: peft>=0.5.0; extra == "models"
Provides-Extra: all
Requires-Dist: honeybee-ml[clinical,database,models,molecular,pathology,radiology,visualization]; extra == "all"
Dynamic: license-file

<div align="center">
  <img src="website/public/images/logo.png" alt="HoneyBee Logo" width="200">

  # HoneyBee

  **A Scalable Modular Framework for Multimodal AI in Oncology**

  [![Nature Digital Medicine](https://img.shields.io/badge/Nature%20Digital%20Medicine-Published-success.svg)](https://www.nature.com/articles/s41746-025-02003-4)
  [![PyPI version](https://img.shields.io/pypi/v/honeybee-ml.svg)](https://pypi.org/project/honeybee-ml/)
  [![PyPI Downloads](https://static.pepy.tech/badge/honeybee-ml)](https://pepy.tech/projects/honeybee-ml)
  [![GitHub stars](https://img.shields.io/github/stars/lab-rasool/HoneyBee?style=social)](https://github.com/lab-rasool/HoneyBee/stargazers)
  [![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
  [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-ee4c2c.svg)](https://pytorch.org/)

  [Documentation  & Examples](https://lab-rasool.github.io/HoneyBee/) | [Paper](https://www.nature.com/articles/s41746-025-02003-4)
</div>

## Publication

**HoneyBee has been officially published in [Nature Digital Medicine](https://www.nature.com/articles/s41746-025-02003-4)!**

> Tripathi, A., Waqas, A., Schabath, M.B. et al. HONeYBEE: enabling scalable multimodal AI in oncology through foundation model-driven embeddings. *npj Digit. Med.* **8**, 622 (2025). https://doi.org/10.1038/s41746-025-02003-4

## Overview

HoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data types—clinical text, radiology images, pathology slides, and molecular data—through a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.

> [!WARNING]
> **Alpha Release**: This framework is currently in alpha. APIs may change, and some features are still under development.

## Key Features

- **Multimodal data support**: clinical text, radiology (DICOM/NIFTI), pathology (WSI), and molecular data
- **3-layer modular architecture**: clean separation between loaders, processors, and embedding models
- **Clinical NLP pipeline**: OCR, cancer entity extraction, temporal parsing, and medical ontology mapping
- **Whole Slide Image processing**: tissue detection, patch extraction, stain normalization, and quality filtering
- **State-of-the-art embedding models**: GatorTron, BioBERT, PubMedBERT, UNI, REMEDIS, RadImageNet, and more
- **Cross-modal integration**: unified patient-level representations from multiple data modalities
- **Survival analysis**: Cox PH, Random Survival Forest, and DeepSurv
- **Similar patient retrieval**: find patients with matching clinical profiles
- **Interactive visualization**: t-SNE dashboards for embedding exploration
- **GPU-accelerated**: CuCIM backend for WSI processing with OpenSlide fallback

## Quick Start

### System Dependencies

```bash
# Ubuntu/Debian
sudo apt-get install -y openslide-tools tesseract-ocr

# macOS
brew install openslide tesseract
```

### Installation

```bash
pip install honeybee-ml
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"
```

### Optional Extras

| Extra | Command | Includes |
|-------|---------|----------|
| Clinical | `pip install honeybee-ml[clinical]` | NLP, OCR, and text processing dependencies |
| Pathology | `pip install honeybee-ml[pathology]` | WSI loading and image processing |
| Molecular | `pip install honeybee-ml[molecular]` | Genomics and expression data support |
| All | `pip install honeybee-ml[all]` | Everything above |

## Research Applications

HoneyBee has been successfully applied to:

- **Cancer Subtype Classification**: Automated identification of cancer subtypes from multimodal data
- **Survival Prediction**: Risk stratification and outcome prediction for treatment planning
- **Similar Patient Retrieval**: Finding patients with similar clinical profiles for precision medicine
- **Biomarker Discovery**: Identifying multimodal patterns associated with treatment response

## License

See the [LICENSE](LICENSE) file for details.

## Citation

If you use HoneyBee in your research, please cite our paper:

```
Tripathi, A., Waqas, A., Schabath, M.B. et al. HONeYBEE: enabling scalable multimodal AI in
oncology through foundation model-driven embeddings. npj Digit. Med. 8, 622 (2025).
https://doi.org/10.1038/s41746-025-02003-4
```
