Metadata-Version: 2.4
Name: radiobject
Version: 0.1.1
Summary: TileDB-backed data structure for radiology data at scale
Project-URL: Repository, https://github.com/samueldsouza/radiobject
Author: Samuel D'Souza
License-Expression: MIT
License-File: LICENSE
Keywords: dicom,machine-learning,medical-imaging,nifti,radiology,tiledb
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Python: >=3.11
Requires-Dist: boto3>=1.34.0
Requires-Dist: botocore[crt]>=1.42.36
Requires-Dist: matplotlib>=3.8.0
Requires-Dist: nibabel>=5.2.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydicom>=2.4.0
Requires-Dist: tiledb>=0.36.0
Requires-Dist: tqdm>=4.66.0
Provides-Extra: dev
Requires-Dist: jupyter>=1.1.1; extra == 'dev'
Requires-Dist: nbconvert>=7.16.6; extra == 'dev'
Requires-Dist: pre-commit>=3.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5.0; extra == 'docs'
Requires-Dist: mkdocs>=1.5.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'docs'
Provides-Extra: download
Requires-Dist: requests>=2.31.0; extra == 'download'
Requires-Dist: tcia-utils>=0.5.0; extra == 'download'
Provides-Extra: ml
Requires-Dist: monai>=1.3.0; extra == 'ml'
Requires-Dist: scikit-learn>=1.3.0; extra == 'ml'
Requires-Dist: simpleitk>=2.3.0; extra == 'ml'
Requires-Dist: tcia-utils>=0.5.0; extra == 'ml'
Requires-Dist: torch>=2.0.0; extra == 'ml'
Requires-Dist: torchio>=0.19.0; extra == 'ml'
Provides-Extra: monai
Requires-Dist: monai>=1.3.0; extra == 'monai'
Requires-Dist: torch>=2.0.0; extra == 'monai'
Provides-Extra: torchio
Requires-Dist: torch>=2.0.0; extra == 'torchio'
Requires-Dist: torchio>=0.19.0; extra == 'torchio'
Provides-Extra: tutorials
Requires-Dist: jupyter>=1.1.1; extra == 'tutorials'
Requires-Dist: requests>=2.31.0; extra == 'tutorials'
Requires-Dist: tcia-utils>=0.5.0; extra == 'tutorials'
Description-Content-Type: text/markdown

# RadiObject

**What?** A TileDB-backed data structure for radiology data at scale.

**Why?** NIfTI/DICOM must be read from local disk and don't support partial reads.
TileDB enables cloud-native storage (S3), efficient partial reads, and
hierarchical organization of multi-volume datasets.

*[Thoughts](https://souzy.up.railway.app/thoughts/radiology-object)*

## Installation

```bash
pip install radiobject
```

## Quick Start

```python
from radiobject import RadiObject

# Create from NIfTI files using images dict (recommended)
radi = RadiObject.from_niftis(
    uri="./my-dataset",
    images={
        "CT": "./imagesTr/*.nii.gz",      # Glob pattern
        "seg": "./labelsTr",               # Directory path
    },
    validate_alignment=True,               # Ensure matching subjects across collections
    obs_meta=metadata_df,                  # Optional subject-level metadata
)

# Access data (pandas-like)
vol = radi.CT.iloc[0]            # First CT volume
data = vol[100:200, :, :]        # Partial read (only loads needed tiles)

# Filtering (returns views)
subset = radi.filter("age > 40")       # Query expression
subset = radi.head(10)                 # First 10 subjects
subset.materialize("./subset")         # Write to storage
```

Works with local paths or S3 URIs (`s3://bucket/dataset`).

## How It Works

NIfTI requires decompressing entire volumes; TileDB reads only the tiles needed.
This enables **200-660x faster** partial reads. [See benchmarks →](docs/BENCHMARKS.md)

## Sample Data

Download sample datasets for tutorials and testing:

```bash
# Install download dependencies
pip install radiobject[download]

# Download BraTS brain tumor data (for tutorials 00-04)
python scripts/download_dataset.py msd-brain-tumour

# List all available datasets
python scripts/download_dataset.py --list
```

## Documentation

- **[Tutorials](notebooks/README.md)** - Interactive notebooks
- **[Data Access](docs/DATA_ACCESS.md)** - Ingestion, queries, filtering
- **[ML Integration](docs/ML_INTEGRATION.md)** - MONAI/TorchIO setup
- **[Design](docs/DESIGN.md)** - Architecture decisions
- **[Benchmarks](docs/BENCHMARKS.md)** - Performance analysis
- **[Datasets](docs/DATASETS.md)** - Available datasets and download instructions

## License

MIT
