Metadata-Version: 2.4
Name: things_eeg2_dataset
Version: 0.0.6
Summary: A simple and fast CLI for downloading, processing, and loading the THINGS-EEG2 dataset.
License: # PolyForm Noncommercial License 1.0.0
        
        <https://polyformproject.org/licenses/noncommercial/1.0.0>
        
        ## Acceptance
        
        In order to get any license under these terms, you must agree
        to them as both strict obligations and conditions to all
        your licenses.
        
        ## Copyright License
        
        The licensor grants you a copyright license for the
        software to do everything you might do with the software
        that would otherwise infringe the licensor's copyright
        in it for any permitted purpose.  However, you may
        only distribute the software according to [Distribution
        License](#distribution-license) and make changes or new works
        based on the software according to [Changes and New Works
        License](#changes-and-new-works-license).
        
        ## Distribution License
        
        The licensor grants you an additional copyright license
        to distribute copies of the software.  Your license
        to distribute covers distributing the software with
        changes and new works permitted by [Changes and New Works
        License](#changes-and-new-works-license).
        
        ## Notices
        
        You must ensure that anyone who gets a copy of any part of
        the software from you also gets a copy of these terms or the
        URL for them above, as well as copies of any plain-text lines
        beginning with `Required Notice:` that the licensor provided
        with the software.  For example:
        
        > Required Notice: Copyright Yoyodyne, Inc. (http://example.com)
        
        ## Changes and New Works License
        
        The licensor grants you an additional copyright license to
        make changes and new works based on the software for any
        permitted purpose.
        
        ## Patent License
        
        The licensor grants you a patent license for the software that
        covers patent claims the licensor can license, or becomes able
        to license, that you would infringe by using the software.
        
        ## Noncommercial Purposes
        
        Any noncommercial purpose is a permitted purpose.
        
        ## Personal Uses
        
        Personal use for research, experiment, and testing for
        the benefit of public knowledge, personal study, private
        entertainment, hobby projects, amateur pursuits, or religious
        observance, without any anticipated commercial application,
        is use for a permitted purpose.
        
        ## Noncommercial Organizations
        
        Use by any charitable organization, educational institution,
        public research organization, public safety or health
        organization, environmental protection organization,
        or government institution is use for a permitted purpose
        regardless of the source of funding or obligations resulting
        from the funding.
        
        ## Fair Use
        
        You may have "fair use" rights for the software under the
        law. These terms do not limit them.
        
        ## No Other Rights
        
        These terms do not allow you to sublicense or transfer any of
        your licenses to anyone else, or prevent the licensor from
        granting licenses to anyone else.  These terms do not imply
        any other licenses.
        
        ## Patent Defense
        
        If you make any written claim that the software infringes or
        contributes to infringement of any patent, your patent license
        for the software granted under these terms ends immediately. If
        your company makes such a claim, your patent license ends
        immediately for work on behalf of your company.
        
        ## Violations
        
        The first time you are notified in writing that you have
        violated any of these terms, or done anything with the software
        not covered by your licenses, your licenses can nonetheless
        continue if you come into full compliance with these terms,
        and take practical steps to correct past violations, within
        32 days of receiving notice.  Otherwise, all your licenses
        end immediately.
        
        ## No Liability
        
        ***As far as the law allows, the software comes as is, without
        any warranty or condition, and the licensor will not be liable
        to you for any damages arising out of these terms or the use
        or nature of the software, under any kind of legal claim.***
        
        ## Definitions
        
        The **licensor** is the individual or entity offering these
        terms, and the **software** is the software the licensor makes
        available under these terms.
        
        **You** refers to the individual or entity agreeing to these
        terms.
        
        **Your company** is any legal entity, sole proprietorship,
        or other kind of organization that you work for, plus all
        organizations that have control over, are under the control of,
        or are under common control with that organization.  **Control**
        means ownership of substantially all the assets of an entity,
        or the power to direct its management and policies by vote,
        contract, or otherwise.  Control can be direct or indirect.
        
        **Your licenses** are all the licenses granted to you for the
        software under these terms.
        
        **Use** means anything you do with the software requiring one
        of your licenses.
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: accelerate>=1.12.0
Requires-Dist: diffusers<0.38.0,>=0.37.0
Requires-Dist: einops>=0.8.1
Requires-Dist: gdown>=5.2.0
Requires-Dist: lightning>=2.6.0
Requires-Dist: mne>=1.11.0
Requires-Dist: osfclient>=0.0.5
Requires-Dist: packaging
Requires-Dist: pandas>=2.3.3
Requires-Dist: plotly>=6.5.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: pyinstrument>=5.1.1
Requires-Dist: safetensors>=0.6.2
Requires-Dist: scikit-learn>=1.7.2
Requires-Dist: sentencepiece
Requires-Dist: streamlit>=1.52.1
Requires-Dist: torch>=2.5.0
Requires-Dist: torchvision>=0.24.1
Requires-Dist: tqdm
Requires-Dist: transformers<6.0.0,>=4.56.2
Requires-Dist: typer<0.25.0,>=0.20.0
Description-Content-Type: text/markdown

<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/ZEISS/things_eeg2_dataset/refs/heads/main/.github/assets/things_eeg2_dataset-banner-dark.png">
  <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/ZEISS/things_eeg2_dataset/refs/heads/main/.github/assets/things_eeg2_dataset-banner-light.png">
  <img alt="things_eeg2_dataset" src="https://raw.githubusercontent.com/ZEISS/things_eeg2_dataset/refs/heads/main/.github/assets/things_eeg2_dataset-banner-light.png">
</picture>

<div align="center">

[![PyPI][pypi-badge]][pypi]
[![Conda Platform][conda-badge]][conda-url]
[![License][license-badge]][license-url]
[![CI Status][ci-badge]][ci-url]

[pypi-badge]: https://img.shields.io/pypi/v/things_eeg2_dataset?style=flat-square&label=PyPI
[pypi]: https://pypi.org/project/things-eeg2-dataset/

[license-badge]: https://img.shields.io/badge/License-CC%20BY--NC%204.0-yellow.svg?style=flat-square
[license-url]: LICENSE

[ci-badge]: https://img.shields.io/github/actions/workflow/status/zeiss/things_eeg2_dataset/ci.yml?branch=main&style=flat-square&label=CI
[ci-url]: https://github.com/zeiss/things_eeg2_dataset/actions/workflows/ci.yml

[conda-badge]: https://img.shields.io/conda/vn/conda-forge/things_eeg2_dataset?style=flat-square
[conda-url]: https://prefix.dev/channels/conda-forge/packages/things_eeg2_dataset

</div>

# Introduction

This package provides tools for downloading, preprocessing the raw THINGS-EEG2 data, and generating image embeddings using various vision models.

> [!WARNING]
> This repository builds upon the original data processing by [Gifford et al (2022)](https://github.com/gifale95/eeg_encoding).
> Please check out their original code and the [corresponding paper](https://www.sciencedirect.com/science/article/pii/S1053811922008758?via%3Dihub).
>
> We are in no way associated with the authors.
> Nonetheless we hope, that this makes things easier (pun intended) to use.

## Installation

### CLI-only

If you only need the CLI functionality, you can run it using one line of code:

#### Using the PyPI package (with uv)

```bash
uvx run --from things_eeg2_dataset things-eeg2
```

#### Using the conda package (with pixi)

```bash
pixi exec --with things_eeg2_dataset things-eeg2
```

### From GitHub

```bash
git clone git@github.com:ZEISS/things_eeg2_dataset.git
cd things_eeg2_dataset

uv sync
uv pip install --editable .
source .venv/bin/activate

things-eeg2 --help
things-eeg2 --install-completion

# Then restart your shell
# Example for zsh:
source ~/.zshrc
```

### From PyPI

```bash
# Using UV
uv init
uv add things_eeg2_dataset
source .venv/bin/activate

things-eeg2 --help
things-eeg2 --install-completion

# Then restart your shell
# Example for zsh:
source ~/.zshrc
```

### Using the conda package

```bash
# Using pixi  
pixi init
pixi add things_eeg2_dataset
pixi shell

things-eeg2 --help
things-eeg2 --install-completion

# Then restart your shell
# Example for zsh:
source ~/.zshrc
```

## Usage

![things_eeg2_dataset demo](https://raw.githubusercontent.com/ZEISS/things_eeg2_dataset/refs/heads/main/.github/assets/demo/demo-light.gif#gh-light-mode-only)
![things_eeg2_dataset demo](https://raw.githubusercontent.com/ZEISS/things_eeg2_dataset/refs/heads/main/.github/assets/demo/demo-dark.gif#gh-dark-mode-only)

## Data Structure

You can understand the data structure that is created by the CLI by referring to [paths.py](src/things_eeg2_dataset//paths.py).
It contains the ground truth data structure used throughout the project.

### Embedding Generation (`embedding_processing/`)

The package supports multiple state-of-the-art vision models for generating image embeddings:

| Model | Embedder Class | Description |
|-------|----------------|-------------|
| `open-clip-vit-h-14` | `OpenClipViTH14Embedder` | OpenCLIP ViT-H/14 (SDXL image encoder) |
| `openai-clip-vit-l-14` | `OpenAIClipVitL14Embedder` | OpenAI CLIP ViT-L/14 |
| `dinov2` | `DinoV2Embedder` | DINOv2 with registers (self-supervised) |
| `ip-adapter` | `IPAdapterEmbedder` | IP-Adapter Plus projections |

Each embedder generates:

- **Pooled embeddings**: Single vector per image (e.g., `(1024,)` for ViT-H-14)
- **Full sequence embeddings**: All tokens (e.g., `(257, 1280)` for ViT-H-14)
- **Text embeddings**: Corresponding text features from image captions

**Output Files:**

```bash
embeddings/
├── ViT-H-14_features_training.safetensors           # Pooled embeddings
├── ViT-H-14_features_training_full.safetensors      # Full token sequences
├── ViT-H-14_features_test.safetensors
└── ViT-H-14_features_test_full.safetensors
```

### Using the dataloader

```python
from things_eeg2_dataset.dataloader import ThingsEEGDataset

dataset = ThingsEEGDataset(
    image_model="ViT-H-14",
    data_path="/path/to/processed_data",
    img_directory_training="/path/to/images/train",
    img_directory_test="/path/to/images/test",
    embeddings_dir="/path/to/embeddings",
    train=True,
    time_window=(0.0, 1.0),
)
```

See `things_eeg2_dataloader/README.md` for detailed usage.

## References & Citation

We are happy users of the [THINGS-EEG2 dataset](https://things-initiative.org/), but not associated with the original authors.
If you use this code, please cite the [THINGS-EEG2 paper](https://www.sciencedirect.com/science/article/pii/S1053811922008758?via%3Dihub):
> Gifford, A. T., Lahner, B., Saba-Sadiya, S., Vilas, M. G., Lascelles, A., Oliva, A., ... & Cichy, R. M. (2022). The THINGS-EEG2 dataset. Scientific Data.
