Metadata-Version: 2.4
Name: caideface
Version: 0.4.0
Summary: MRI defacing pipeline with skull-stripping and affine registration from cai4cai
Author-email: Lorena Garcia-Foncillas <lorenagarfon00@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/cai4cai/defacing_pipeline
Project-URL: Repository, https://github.com/cai4cai/defacing_pipeline
Keywords: MRI,defacing,anonymisation,skull-stripping,neuroimaging,NER,text-anonymization
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: nibabel>=4.0
Requires-Dist: numpy<2,>=1.22
Requires-Dist: scipy>=1.9
Requires-Dist: SimpleITK>=2.2
Requires-Dist: pandas>=1.5
Requires-Dist: natsort>=8.0
Requires-Dist: tqdm>=4.60
Requires-Dist: hd-bet
Requires-Dist: spacy>=3.5
Requires-Dist: faker>=18.0
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Provides-Extra: ct
Requires-Dist: TotalSegmentator; extra == "ct"

# caideface

**MRI and CT defacing and text anonymisation toolkit** from the [cai4cai](https://cai4cai.ml/) research group (Contextual Artificial Intelligence for Computer Assisted Interventions).

This package provides two complementary anonymisation capabilities:

- **Image defacing** -- removes facial features from head MRI and CT scans while preserving brain structures. The MRI pipeline is described in the paper *"A Generalisable Head MRI Defacing Pipeline: Evaluation on 2,566 Meningioma Scans"* ([arXiv:2505.12999](https://arxiv.org/abs/2505.12999)).
- **Text anonymisation** -- detects personal names in medical reports using a trained spaCy NER model and replaces them with realistic fake names (Hiding in Plain Sight / HIPS technique), as described in *"Evaluation of Named Entity Recognition for Automated Extraction of Present Tumor Size and Personal Names from Radiology Reports Using Spacy"* ([DOI:10.1055/s-0045-1803715](https://doi.org/10.1055/s-0045-1803715)).

## Pipeline overview

### Image defacing pipeline

The defacing pipeline supports both **MRI** and **CT** modalities, selected via the required `--modality {mri,ct}` flag. The pipeline consists of three steps, with modality-specific backends:

| Step | MRI | CT |
|------|-----|-----|
| 1. Reorientation | → RAS | → RAS |
| 2. Brain extraction | [HD-BET](https://github.com/MIC-DKFZ/HD-BET) | [TotalSegmentator](https://github.com/wasserth/TotalSegmentator) |
| 3. Registration template | MNI152 T1 (bundled) | CT brain atlas (from TotalSegmentator) |
| Background value | Always 0 | Auto-detected per volume |

1. **Reorientation** -- Aligns NIfTI scans to RAS canonical orientation (MNI152 standard) using nibabel, equivalent to FSL's `fslreorient2std`.
2. **Skull-stripping** -- Extracts brain masks, then applies dynamic dilation to preserve peripheral brain structures. MRI uses [HD-BET](https://github.com/MIC-DKFZ/HD-BET); CT uses [TotalSegmentator](https://github.com/wasserth/TotalSegmentator) (brain class from the total segmentation task).
3. **Registration & Defacing** -- Registers each scan to a modality-matched template using BRAINSFit (affine), warps a face mask into the scan's space, and applies it to remove facial features. For CT, the background fill value is automatically detected from the volume histogram (~-1000 HU for native encoding).

### Text anonymisation (NER + HIPS)

The text anonymisation module uses a trained spaCy Named Entity Recognition (NER) model to identify personal names (`PER` entities) in `.txt` files and replaces them with realistic fake names generated by the [Faker](https://faker.readthedocs.io/) library. This "Hiding in Plain Sight" (HIPS) approach produces anonymised reports that remain naturally readable. Consistent name mapping ensures that the same real name is always replaced with the same fake name within a document.

All required models and data for MRI defacing and text anonymisation are **bundled with the package**. CT defacing requires installing the optional `[ct]` extra (see [Installation](#installation)).

## Requirements

### Python

- Python >= 3.9

### External tools (not pip-installable)

| Tool | Used in | Install |
|------|---------|---------|
| **BRAINSFit** & **BRAINSResample** | Step 3 | Bundled with [3D Slicer](https://www.slicer.org/) |

> **Note:** Step 1 (reorientation) no longer requires FSL -- it uses nibabel's orientation tools to reorient scans to RAS (equivalent to `fslreorient2std`).

#### Finding BRAINSFit and BRAINSResample

These executables are included with 3D Slicer. Common locations:

- **macOS**: `/Applications/Slicer.app/Contents/lib/Slicer-5.8/cli-modules/BRAINSFit`
- **Linux**: `/path/to/Slicer/lib/Slicer-5.8/cli-modules/BRAINSFit`

Replace `5.8` with your installed Slicer version if different. To verify the executables are found and working:

```bash
# Check they exist
ls /Applications/Slicer.app/Contents/lib/Slicer-5.8/cli-modules/BRAINSFit
ls /Applications/Slicer.app/Contents/lib/Slicer-5.8/cli-modules/BRAINSResample

# Check they run (should print usage/help info)
/Applications/Slicer.app/Contents/lib/Slicer-5.8/cli-modules/BRAINSFit --help
/Applications/Slicer.app/Contents/lib/Slicer-5.8/cli-modules/BRAINSResample --help
```

You can also build them from source via [BRAINSTools](https://github.com/BRAINSia/BRAINSTools).

## Installation

We recommend using a conda environment:

```bash
conda create -n caideface python=3.10 -y
conda activate caideface

# MRI defacing only
pip install caideface

# MRI + CT defacing (includes TotalSegmentator)
pip install caideface[ct]
```

Or install from GitHub:

```bash
pip install "caideface @ git+https://github.com/cai4cai/defacing_pipeline.git#subdirectory=caideface"

# With CT support
pip install "caideface[ct] @ git+https://github.com/cai4cai/defacing_pipeline.git#subdirectory=caideface"
```

Or install from source:

```bash
git clone https://github.com/cai4cai/defacing_pipeline.git
cd defacing_pipeline/caideface
pip install -e .        # MRI only
pip install -e ".[ct]"  # MRI + CT
```

> **Note:** caideface requires `numpy<2` (enforced automatically). Some dependencies (HD-BET / nnU-Net) are not yet compatible with NumPy 2.x.
>
> **Note:** CT support requires [TotalSegmentator](https://github.com/wasserth/TotalSegmentator), which downloads model weights (~1.5 GB) on first use to `~/.totalsegmentator/`. All inference runs locally -- no scan data is sent externally.

## Usage

### CLI -- Full defacing pipeline

Run all three steps in one command:

```bash
# MRI
caideface run ./input_nifti ./output \
  --modality mri \
  --brainsfit /path/to/BRAINSFit \
  --brainsresample /path/to/BRAINSResample

# CT
caideface run ./input_nifti ./output \
  --modality ct \
  --brainsfit /path/to/BRAINSFit \
  --brainsresample /path/to/BRAINSResample
```

This creates three subdirectories under `./output`:
- `reoriented/` -- Step 1 outputs
- `skullstripped/` -- Step 2 outputs (skull-stripped, masks, dilated)
- `defaced/` -- Step 3 outputs (final defaced scans)

#### Options

| Flag | Default | Description |
|------|---------|-------------|
| `--modality` | *required* | Image modality: `mri` or `ct` |
| `--device` | auto-detected | `cpu` or `cuda` for brain extraction |
| `--no-tta` | on | Disable HD-BET test-time augmentation (MRI only) |
| `--dilation-mm` | `14.0` | Brain mask dilation in mm |
| `--background` | auto-detected | Background fill value (auto-detected per volume; override with explicit value) |
| `--template` | bundled | Custom skull-stripped template |
| `--face-mask` | bundled | Custom face mask in template space |
| `--steps` | `all` | Run specific steps: `reorient`, `skull_strip`, `deface` (comma-separated) |
| `-v` | off | Verbose/debug logging |

### CLI -- Individual defacing steps

Run each step separately for more control:

```bash
# Step 1: Reorientation
caideface reorient ./raw_nifti ./reoriented --modality mri

# Step 2: Skull-stripping
caideface skull-strip ./reoriented ./skullstripped --modality mri --device cpu

# Step 3: Registration & Defacing
caideface deface ./reoriented ./skullstripped ./defaced \
  --modality mri \
  --brainsfit /path/to/BRAINSFit \
  --brainsresample /path/to/BRAINSResample
```

### CLI -- Text anonymisation

#### Single file

```bash
caideface anonymize-single ./reports/report_1.txt ./anonymized/report_1.txt
```

#### Batch (all `.txt` files in a directory)

```bash
caideface anonymize ./reports ./anonymized_reports
```

#### Options

Both commands accept the same options:

| Flag | Default | Description |
|------|---------|-------------|
| `--model` | bundled | Path to a custom spaCy NER model directory |
| `--n-names` | `50` | Size of the fake name pool |
| `--seed` | none | Random seed for reproducible output |
| `-v` | off | Verbose/debug logging |

#### Example

**Input** (`reports/report_1550.txt`):
```
Reported by Danielle Smith and William Stuart on 03/10/2014
```

**Output** (`anonymized_reports/report_1550.txt`):
```
Reported by Ryan Munoz and Holly Wood on 03/10/2014
```

The batch command saves an `anonymization_log.csv` alongside the output files with a summary of replacements per file.

### Python API -- Text anonymisation

#### Single file

```python
from caideface.anonymize import load_ner_model, generate_fake_names, anonymize_single

# Load model and generate fake name pool (do this once)
nlp = load_ner_model()                        # uses bundled model
fake_names = generate_fake_names(n=50, seed=42)

# Anonymise a single report
result = anonymize_single(
    input_file="reports/report_1.txt",
    output_file="anonymized/report_1.txt",
    nlp=nlp,
    fake_names=fake_names,
)
print(result["replacements"])   # number of names replaced
print(result["names_found"])    # list of original names detected
print(result["name_mapping"])   # {original_name: fake_name} mapping
```

#### Batch processing

```python
from caideface import anonymize_batch

# Anonymise all .txt files in a directory
log_df = anonymize_batch(
    input_dir="reports/",
    output_dir="anonymized_reports/",
    seed=42,
)
print(log_df)  # DataFrame with file, replacements, names_found per file
```

#### All available imports

```python
from caideface import (
    DefacePipeline,           # Full image defacing pipeline
    reorient_batch,           # Step 1
    skull_strip_batch,        # Step 2
    deface_batch,             # Step 3
    anonymize_batch,          # Text anonymisation (batch)
    anonymize_single,         # Text anonymisation (single file)
    default_ner_model_path,   # Path to bundled NER model
    detect_background_value,  # CT/MRI background detection
)
```

## Output structure

### Image defacing

```
output/
├── reoriented/
│   ├── reorientation_log.csv
│   └── <subject>/<scan>.nii.gz
├── skullstripped/
│   ├── skull_strip_log.csv
│   └── <subject>/
│       ├── <scan>_brain.nii.gz            # Brain-extracted
│       ├── <scan>_mask.nii.gz             # Dilated brain mask
│       └── <scan>_dilated.nii.gz          # Dilated skull-stripped
└── defaced/
    ├── not_defaced_scans.csv              # Only if failures occurred
    └── <subject>/
        └── <scan>_dilated_masked.nii.gz   # Final defaced scan
```

### Text anonymisation

```
anonymized_reports/
├── anonymization_log.csv                  # Replacements per file
├── report_1.txt                           # Anonymised report
├── report_2.txt
└── ...
```

## Existing transforms

If you have pre-computed registration transforms (e.g. from 3D Slicer), place a file named `Transform_to_template.txt` in the same directory as the dilated skull-stripped scan. The pipeline will use it instead of running BRAINSFit. Both plain 4x4 text matrices and ITK/Slicer transform formats are supported.

## Citation

If you use this tool, please cite:

```bibtex
@article{caideface2025,
  title={A Generalisable Head MRI Defacing Pipeline: Evaluation on 2,566 Meningioma Scans},
  year={2025},
  url={https://arxiv.org/abs/2505.12999}
}
```

If you use CT defacing (TotalSegmentator, Step 2), please also cite:

```bibtex
@article{Wasserthal2023,
  author={Wasserthal, Jakob and Breit, Hanns-Christian and Meyer, Manfred T. and Pradella, Maurice and Hinck, Daniel and Sauter, Alexander W. and Heye, Tobias and Boll, Daniel T. and Cyriac, Joshy and Yang, Shan and Bach, Michael and Segeroth, Martin},
  title={TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images},
  journal={Radiology: Artificial Intelligence},
  volume={5},
  number={5},
  year={2023},
  doi={10.1148/ryai.230024}
}
```

If you use HD-BET (MRI skull-stripping, Step 2), please also cite:

```bibtex
@article{Isensee2019,
  author={Isensee, F. and Schell, M. and Tursunova, I. and Brugnara, G. and Bonekamp, D. and Neuberger, U. and Wick, A. and Schlemmer, H. P. and Heiland, S. and Wick, W. and Bendszus, M. and Maier-Hein, K. H. and Kickingereder, P.},
  title={Automated brain extraction of multi-sequence MRI using artificial neural networks},
  journal={Human Brain Mapping},
  year={2019},
  pages={1--13},
  doi={10.1002/hbm.24750}
}
```

If you use the text anonymisation (NER + HIPS), please also cite:

```bibtex
@article{garcia2025ner,
  title={Evaluation of Named Entity Recognition for Automated Extraction of Present Tumor Size and Personal Names from Radiology Reports Using Spacy},
  author={Garcia-Foncillas Macias, Lorena and Barfoot, Theodore and Vercauteren, Tom and Shapey, Jonathan},
  journal={Journal of Neurological Surgery Part B: Skull Base},
  volume={86},
  number={S 01},
  year={2025},
  doi={10.1055/s-0045-1803715}
}
```



## License

This project is licensed under the Apache License 2.0 -- see the [LICENSE](https://github.com/cai4cai/defacing_pipeline/blob/main/LICENSE) file for details.
