Metadata-Version: 2.4
Name: cadence-core
Version: 0.1.0
Summary: Flat-MLP with PubMedBERT-enriched self-distillation for clinical next-event prediction
Project-URL: Homepage, https://github.com/amirrouh/cadence
Project-URL: Repository, https://github.com/amirrouh/cadence
Project-URL: Issues, https://github.com/amirrouh/cadence/issues
Author-email: Amir Rouhollahi <arouhollahi@bwh.harvard.edu>
License: MIT
License-File: LICENSE
Keywords: clinical,ehr,healthcare-ml,next-event-prediction,pubmedbert
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Python: >=3.10
Requires-Dist: huggingface-hub>=0.23
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scikit-learn>=1.3
Requires-Dist: sentence-transformers>=2.7
Requires-Dist: torch>=2.1
Requires-Dist: tqdm>=4.66
Requires-Dist: transformers>=4.40
Provides-Extra: dev
Requires-Dist: build; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Description-Content-Type: text/markdown

# Cadence

Clinical next-event prediction: a flat-MLP with PubMedBERT-enriched features and self-knowledge distillation, trained on EHR event sequences.

## Install

```bash
pip install cadence-core
```

## Quickstart

### Inference with a pretrained model

```python
from cadence import Cadence

model = Cadence.from_pretrained("amirrouh/cadence-mimic-100k")
next_event, days_until = model.predict(patient_events)
```

### Training on your own data

```python
from cadence import Cadence

model = Cadence()
model.fit(events_df)
model.save("my-model/")
```

## Input data format

`events_df` is a pandas DataFrame with the following columns:

- `patient_id` — patient identifier (any hashable type)
- `timestamp` — event time (datetime or ISO string; coerced via `pd.to_datetime`)
- `event_text` — free-text event description (e.g. "Patient admitted with chest pain")
- `cluster_id` — integer event cluster (optional; auto-assigned via sentence-transformers + KMeans if omitted)

Example:

| patient_id | timestamp           | event_text                          | cluster_id |
|------------|---------------------|-------------------------------------|------------|
| P001       | 2024-01-15 09:30    | Patient admitted with chest pain    | 3          |
| P001       | 2024-01-15 11:45    | ECG performed, ST elevation         | 7          |
| P002       | 2024-02-03 14:20    | Routine check-up, vitals normal     | 1          |

`.predict(patient_events)` returns `(next_event_label, days_until)` for `top_k=1`, or a dict of top-k predictions with confidences when `top_k > 1`.

## Architecture

Cadence implements the NVC-Clean v14 champion model:

- **Feature engineering**: 884-d handcrafted features (population anomaly scores, narrative velocity, temporal-gap statistics, cluster bag-of-words)
- **Optional**: PubMedBERT embeddings (mean + last token, 1536-d) appended → 2420-d total input
- **Backbone**: flat-MLP with BatchNorm (Linear 884→1024→1024→512 with residual skip)
- **Classification head**: Asymmetric Loss (ASL, Ridnik et al. 2021)
- **Regression head**: quantile-bin softmax expectation for time-to-next-event
- **Training**: Phase 1 (frozen) + Phase 2 (full), MixUp augmentation, Stochastic Weight Averaging, self-knowledge distillation

## Citation

Manuscript in preparation; citation forthcoming.

## License

MIT. Copyright 2026 Amir Rouhollahi.

## Links

- GitHub: https://github.com/amirrouh/cadence
- Issues: https://github.com/amirrouh/cadence/issues
