Metadata-Version: 2.4
Name: seizure-eeg-detector
Version: 0.1.0
Summary: Prepare detector features and run seizure EEG cross-validation on CHB-MIT and EU extracted records.
License-Expression: MIT
Project-URL: Homepage, https://github.com/jamiekoe/seizure-eeg-detector
Project-URL: Repository, https://github.com/jamiekoe/seizure-eeg-detector
Keywords: eeg,seizure,chb-mit,epilepsy,classification
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: lightgbm>=4.1
Requires-Dist: natsort>=8.4
Requires-Dist: numpy>=1.23
Requires-Dist: scikit-learn>=1.3
Requires-Dist: scipy>=1.10
Requires-Dist: tqdm>=4.64
Provides-Extra: examples
Requires-Dist: matplotlib>=3.6; extra == "examples"
Requires-Dist: pandas>=2; extra == "examples"
Provides-Extra: dev
Requires-Dist: build>=1; extra == "dev"
Requires-Dist: pytest>=7; extra == "dev"
Dynamic: license-file

# Seizure EEG Detector

Prepare detector-ready EEG features and run seizure detection
cross-validation on extracted CHB-MIT and EU Epilepsy records.

This package is the companion detector pipeline for
[`seizure-eeg-extractor`](https://github.com/jamiekoe/seizure-eeg-extractor).
Use the extractor first to convert raw dataset files into `eeg.npy` and
`info.pkl` record folders. This package then computes detector feature arrays,
creates seizure/interictal arrays, and trains simple baseline classifiers.

Two temporal feature encodings are implemented. Choose one explicitly with
`--feature-method` when preparing features:

- Energy-decay-memory (`--feature-method edm`): O'Leary, G., Groppe,
  D. M., Valiante, T. A., Verma, N., and Genov, R. (2018). "NURIP:
  Neural Interface Processor for Brain-State Classification and
  Programmable-Waveform Neurostimulation." IEEE Journal of Solid-State
  Circuits, 53(11), 3150-3162. https://doi.org/10.1109/JSSC.2018.2869579
- Windowed spectral features (`--feature-method windowed`): Shoeb, A. H.
  (2009). "Application of Machine Learning to Epileptic Seizure Onset
  Detection and Treatment." PhD thesis, Massachusetts Institute of Technology.

The raw EEG datasets are not included. Download and use CHB-MIT and EU
Epilepsy/EPILEPSIAE data according to their own access, citation, privacy, and
data-use terms.

## Installation

Use Python 3.10 or newer.

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install seizure-eeg-detector
```

For local development:

```bash
python -m pip install -e ".[dev]"
```

## Expected Input

`seizure-eeg-detector` expects the NumPy record layout produced by
[`seizure-eeg-extractor`](https://github.com/jamiekoe/seizure-eeg-extractor),
which is also available on
[PyPI](https://pypi.org/project/seizure-eeg-extractor/):

```text
extracted_data/
  <patient_id>/
    record_0/
      eeg.npy
      info.pkl
    record_1/
      eeg.npy
      info.pkl
```

Each `info.pkl` must include `fs`, `channel_names`, `num_seizures`, and
`seizure_times`. Seizure intervals use record-local sample indices:
`onset_index` and `offset_index`.

## Command Line Usage

Prepare detector feature arrays for selected CHB-MIT patients:

```bash
eeg-detect prepare chbmit /path/to/extracted_data \
  --patients chb01 chb02 \
  --feature-method edm \
  --downsample-factor 256
```

Prepare windowed spectral features:

```bash
eeg-detect prepare chbmit /path/to/extracted_data \
  --patients chb01 chb02 \
  --feature-method windowed \
  --window-seconds 2 \
  --window-count 3 \
  --downsample-factor 256
```

Prepare selected EU patients using manually chosen channels:

```bash
eeg-detect prepare eu /path/to/extracted_data \
  --patients pat_FR_548 pat_FR_1096 pat_FR_1125 \
  --channels HL1 HL2 HL3 HL4 HL5 HL6 HL7 HL8 \
  --channel-mode within-patient \
  --feature-method edm \
  --downsample-factor 1024
```

`--downsample-factor` keeps every Nth prepared feature sample when writing
`seizure_<n>.npy` and `interictal.npy`. With EDM features, the raw EEG is
filtered and the EDM state is updated at the original sampling rate before this
output downsampling. For example, with 256 Hz CHB-MIT data,
`--downsample-factor 256` creates one detector feature row per second
(`effective_fs = 1 Hz`).

During preparation, the CLI reports the selected channels, whether it had to
fall back to the largest compatible record subset, whether
`--channel-mode across-patients` removed patient-available channels, and any
records skipped because they do not contain the selected channel set.

Run cross-validation after preparation:

```bash
eeg-detect cross-validate chbmit /path/to/extracted_data /path/to/results \
  --patients chb01 \
  --model lgbm \
  --num-trees 1024 \
  --threshold 0.5 \
  --balance-training undersample
```

Apply additional thresholds to saved raw prediction scores without retraining:

```bash
eeg-detect threshold-sweep /path/to/results --thresholds 0.1 0.3 0.5
```

Summarize patient-level and overall results:

```bash
eeg-detect summarize /path/to/results
```

This writes:

- `summary/summary.json`
- `summary/patient_summary.csv`
- `summary/overall_summary.csv`

The summary includes the decision threshold, detected seizures, missed seizures,
sensitivity, latency statistics for detected seizures, total false positives,
and FPR/hour. Overall rows use `pooled_*` for metrics computed after combining
all patients, and `mean_patient_*` for unweighted means of patient-level
metrics.

Supported models are `lgbm`, `svm`, and `adaboost`. The CLI defaults to
LightGBM on CPU for portability; pass `--lgbm-device gpu` only when your local
LightGBM build supports GPU training.

By default, cross-validation trains on every prepared seizure and interictal
feature row. Pass `--balance-training undersample` to randomly downsample the
majority class inside each training fold so the model sees equal numbers of
seizure and interictal rows. This does not change the held-out test records or
the reported metrics. Balanced result directories include the balance mode in
the path, for example `lgbm/trees_1024/balance_undersample/threshold_0.5`.

## Python Usage

```python
from seizure_eeg_detector import (
    CHBPatient,
    CVTraining,
    DataPrepper,
    FeatureExtractor,
    ModelType,
    TrainingBalanceMode,
)

input_path = "/path/to/extracted_data"
patient = CHBPatient("chb01", input_path)

feature_extractor = FeatureExtractor(
    bands=[
        (0.5, 3.5),
        (3.5, 6.5),
        (6.5, 9.5),
        (9.5, 12.5),
        (12.5, 15.5),
        (15.5, 18.5),
        (18.5, 21.5),
        (21.5, 24.5),
    ],
    alphas=[7, 9, 10, 11, 12, 16],
    feature_method="edm",
)
dataprepper = DataPrepper(feature_extractor, downsample_factor=256)
channels = dataprepper.select_channels(patient.record_paths)
dataprepper.prep_data(patient.record_paths, channels)

trainer = CVTraining(
    "/path/to/results",
    threshold=0.5,
    balance_training=TrainingBalanceMode.UNDERSAMPLE,
)
trainer.run_cv(patient, ModelType.LGBM, params={"objective": "binary"}, num_trees=1024)
```

```python
from seizure_eeg_detector import summarize_results, write_summary_outputs

summary = summarize_results("/path/to/results")
write_summary_outputs(summary, "/path/to/results/summary")
```

```python
from seizure_eeg_detector import sweep_thresholds

sweep_thresholds("/path/to/results", [0.1, 0.3, 0.5])
```

## Processing Method

Feature preparation always starts by computing spectral features:

- Spectral energy: selected EEG channels are filtered into configurable
  frequency bands with FIR bandpass filters, then absolute amplitudes are used
  as per-band features.

The temporal feature encoding must be selected with `--feature-method`:

- `edm`: each spectral-energy feature is expanded with a set of exponential
  decay traces controlled by `alphas`, following the EDM feature approach
  described by O'Leary et al. (2018).
- `windowed`: spectral-energy features are averaged over `--window-seconds`
  epochs, then the most recent `--window-count` epoch vectors are concatenated.
  This follows the temporal feature-vector design described by Shoeb (2009),
  where `L = 2` seconds and `W = 3` are the usual settings.

Prepared files are saved into each record directory:

- `seizure_<n>.npy` for each labeled seizure interval.
- `interictal.npy` for records with no labeled seizures.
- Optional `se.npy` files when `--save-se` is passed, and optional `edm.npy`
  files for EDM runs when `--save-edm` is passed.

Cross-validation leaves out one seizure record or one interictal record at a
time, trains on the remaining prepared arrays, writes metrics, and saves raw
and thresholded predictions under the results directory. The default decision
threshold is `0.5`, and result directories include it in the run path, for
example `lgbm/trees_1024/threshold_0.5`. If training balancing is enabled, only
the training fold is resampled; held-out seizure and interictal arrays are
evaluated unchanged. The summarizer treats
`latency: nan` as a missed seizure when computing sensitivity. Detected latency
statistics exclude missed seizures. The `fpr` metric reported by this package is
false positives per hour, counted from positive prediction samples rather than
event-collapsed false alarms.

Threshold sweeps reuse the saved `preds_*.txt` raw score files and write
sibling `threshold_<value>` result directories with recomputed metrics and
binary predictions. Cross-validation saves the effective prediction sampling
rate in `run_config.json`, and prediction files must contain complete raw score
arrays because omitted scores cannot be recovered during threshold sweeps.

## Citation

If you use this package in academic work, please cite the detector software:

```bibtex
@software{koerner2026seizure_eeg_detector,
  author = {Koerner, Jamie},
  title = {seizure-eeg-detector},
  year = {2026},
  url = {https://github.com/jamiekoe/seizure-eeg-detector}
}
```

The repository also includes `CITATION.cff` for citation managers and GitHub's
citation UI.

## License

This project is distributed under the MIT License.
