Metadata-Version: 2.3
Name: fw-gear-deid-inplace
Version: 1.2.5
Summary: Deidentifies a file in place.
License: MIT
Keywords: Flywheel,Gears
Author: Flywheel
Author-email: support@flywheel.io
Requires-Python: >=3.12,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Requires-Dist: Jinja2 (>=3,<4)
Requires-Dist: flywheel-gear-toolkit (>=0.6,<0.7)
Requires-Dist: flywheel-migration[pixel] (>=13.10.2,<14.0.0)
Requires-Dist: flywheel-sdk (==16.19.2)
Requires-Dist: fw-gear-deid-export (>=1.8,<2.0)
Requires-Dist: rpds-py (>=0.20.0,<0.21.0)
Project-URL: Repository, https://gitlab.com/flywheel-io/scientific-solutions/gears/deid-inplace
Description-Content-Type: text/markdown

# Anonymized/De-identified In Place

## Overview

### Summary

Profile-based anonymization of a file in flywheel.
Files will be anonymized according to a de-id YAML profile and will overwrite
or create a new version of the source file.

Currently supported files are:

* Dicom
* JPG
* PNG
* TIFF
* XML
* JSON
* Text file defining key/value pair (e.g. MHD)
* CSV
* TSV

Currently supported field transformations are:

* ``remove``: Removes the field from the metadata.
* ``replace-with``: Replaces the contents of the field with the value provided.
* ``increment-date``: Offsets the date by the number of days.
* ``increment-datetime``: Offsets the datetime by the number of days.
* ``hash``: Replace the contents of the field with a one-way cryptographic hash.
* ``hashuid``: Replaces a UID field with a hashed version of that field.
* ``jitter``: Shifts value by a random number.
* ``encrypt`` (non-DICOM): Encrypts the field in place with AES-EAX encryption
* ``encrypt`` (DICOM): Removes the field from the DICOM and stores the original value
in EncryptedAttributesSequence with CMS encryption
* ``decrypt`` (non-DICOM): Decrypts the field in place with AES-EAX decryption
* ``decrypt`` (DICOM): Replace the contents of the field with the value stored in
EncryptedAttributesSequence with CMS decryption
* ``regex-sub``: Replace the contents of the field with a value built from
  other fields and/or group extracted from the field value.
* ``keep``: Do nothing.

Additionally, for DICOM, pixel data masking is supported based on pre-defined
pixel coordinates ([doc](https://flywheel-io.gitlab.io/public/migration-toolkit/pages/pixels.html)).

The YAML profile extends the
[flywheel-migration-toolkit](https://gitlab.com/flywheel-io/public/migration-toolkit)
de-id profile to flywheel metadata container. Documentation on how to write YAML
configuration for the different supported files can be found in the flywheel-migration
[doc](https://flywheel-io.gitlab.io/public/migration-toolkit/).

_NOTE:_ Metadat extraction must be rerun on the output file, as
the gear itself does not propagate/modify DICOM metadata.

### License

MIT

### Classification

utility

* Gear Level:*

* [x] Project
* [x] Subject
* [x] Session
* [x] Acquisition
* [ ] Analysis

----

[[_TOC_]]

----

### Inputs

* _deid-profile_
  * __Name__: deid-profile
  * __Type__: file
  * __Optional__: false
  * __Description__: A Flywheel de-identification profile specifying the
      de-identification actions to perform

* _subject-csv_
  * __Name__: subject-csv
  * __Type__: file
  * __Optional__: true
  * __Description__: A CSV file that contains mapping values to apply for subjects
      during de-identification.
  
* _input-file_
  * __Name__: input-file
  * __Type__: file
  * __Optional__: false
  * __Description__: An input file to be de-identified

#### deid_profile (required)

This is a YAML file that describes the protocol for de-identifying
input-file. This file covers all the same functionality of Flywheel
CLI de-identification.

NOTE: By default, flywheel metadata will be removed from the file. If you want
the file's metadata to be passed along to the new deidentified version of the
file, you MUST include the `flywheel` section:

```yaml
flywheel:
  file:
    all: true
```

A simple example deid_profile.yaml looks like this:

``` yaml
# Configuration for DICOM de-identification
dicom:
  # What date offset to use, in number of days
  date-increment: -17

  # Set patient age from date of birth
  patient-age-from-birthdate: true
  # Set patient age units as Years
  patient-age-units: Y
  # Remove private tags
  remove-private-tags: true

  # filenames block to manipulate output filename based on input filename
  filenames:
      # input regular expression that match source filename
    - input-regex: '.*'
      # formatter of the output filename
      output: '{SOPInstanceUID}.dcm'

  fields:
    # Remove a dicom field value (e.g. remove “StationName”)
    - name: StationName
      remove: true

    # Increment a date field by -17 days
    - name: StudyDate
      increment-date: true

    # Increment a datetime field by -17 days
    - name: AcquisitionDateTime
      increment-datetime: true

    # One-Way hash a dicom field to a unique string
    - name: AccessionNumber
      hash: true

    # One-Way hash the ConcatenationUID,
    # keeping the prefix (4 nodes) and suffix (2 nodes)
    - name: ConcatenationUID
      hashuid: true

# Zip profile to handle e.g. .dcm.zip archive. All member file will be de-id accordly
# to that same profile. 
zip:
  fields:
  - name: comment
    replace-with: FLYWHEEL
  filenames:
  - input-regex: (?P<used>.*).dicom.zip$
    output: '{used}.dcm.zip'
  hash-subdirectories: true
  validate-zip-members: true

# The flywheel configuration to handle flywheel metadata de-id.
flywheel:
  # subject container
  subject:
    # If set to true, export all source container metdata to destination container.
    all: true

  # session container
  session:
    # If set to false, only export to destination container the metadata defined
    # in the fields key.
    all: false
    date-increment: -17
    fields:
      - name: operator
        replace-with: REDACTED
      - name: info.sessiondate
        increment-date: true
      - name: tags
        replace-with: 
          - deid-exported

  acquisition:
    all: true

  file:
    all: true
    # If set to true, export the file info header to the destination container.
    # If set to false or missing, the file info header will be removed from the 
    # destination container.
    include-info-header: true
```

#### subject-csv (optional)

The subject_csv facilitates subject-specific configuration of
de-identification profiles. This is a csv file that contains the column
`subject.label` with unique values corresponding to the `subject.label`
values in the project to be exported. If a subject in the project to be
exported is not listed in `subject.label` in the provided subject_csv
this subject will not be exported.

##### Subject-level customization with subject-csv and deid-profile

Requirements:

* To update subject fields, the fields must both be represented in the
  subject_csv as column header and in the deid_profile as jinja variable
  (i.e `"{{ var_name }}"`).
* If a field is represented in both the deid_profile and the
  subject_csv, the value in the deid_profile will be replaced with the
  value listed in the corresponding column of the subject_csv for each
  subject that has a label listed in the `subject.label` column.
* Fields represented in the deid_profile but not in the subject_csv will
  be the same for all subjects.

Let's walk through an example pairing of subject_csv and deid_profile
to illustrate.

The following table represents subject_csv (../tests/data/example-csv-mapping.csv):

| subject.label | DATE_INCREMENT | SUBJECT_ID  | PATIENT_BD_BOOL |
|---------------|----------------|-------------|-----------------|
| 001           | -15            | Patient_IDA | false           |
| 002           | -20            | Patient_IDB | true            |
| 003           | -30            | Patient_IDC | true            |

The deid_profile:

``` yaml
dicom:
  # date-increment can be any integer value since dicom.date-increment is
  # defined in example-csv-mapping.csv
  date-increment: "{{ DATE_INCREMENT }}"
  # since example-csv-mapping.csv doesn't define dicom.remove-private-tags,
  # all subjects will have private tags removed
  remove-private-tags: true
  fields:
    - name: PatientBirthDate
      # remove can be any boolean since dicom.fields.PatientBirthDate.remove is defined
      # in example-csv-mapping.csv
      remove: "{{ PATIENT_BD_BOOL }}"
    - name: PatientID
      # replace-with can be any string value since dicom.fields.PatientID.replace-with
      # is defined in example-csv-mapping.csv
      replace-with: "{{ SUBJECT_ID }}"
```

The resulting profile for subject 003 given the above would be:

``` yaml
dicom:
  # date-increment can be any integer value since dicom.date-increment is
  # defined in example-csv-mapping.csv
  date-increment: -30
  remove_private_tags: true
  fields:
    - name: PatientBirthDate
      remove: true
    - name: PatientID
      replace-with: Patient_IDC 
```

### Config

* _debug_
  * __Name__: debug
  * __Type__: boolean
  * __Default__: false
  * __Description__: If true, the gear will print debug information to the log.

* _tag_
  * __Name__: tag
  * __Type__: string
  * __Default__: "deid-inplace"
  * __Description__: The tag prefix to append to the file after the gear runs.
    The tag will be `<prefix>-PASS` or `<prefix>-FAIL`, depending on the gear
    run status.

* _delete-original_
  * __Name__: delete-original
  * __Type__: boolean
  * __Default__: true
  * __Description__: If True, the original file is deleted and replaced with the
  de-identified file, rendering the original file unrecoverable. If False, the
  de-identified file overwrites the original, resulting in a file version
  increment that can be reversed.

* _private_key_
  * __Name__: private_key
  * __Type__: string
  * __Description__: Asymmetric decryption: the resolver path and filename of
    the private key pem file, formatted as `<group>/<project>/files/<filename>`
    (E.g., `flywheel/test/files/private_key.pem`) if the key is saved at the
    project level, or `<group>/<project>/<subject>/files/<filename>` if stored
    at the subject level, or
    `<group>/<project>/<subject>/<session>/<acquisition>/<filename>` if stored
    within an acquisition container.

* _public_key_
  * __Name__: public_key
  * __Type__: string
  * __Description__: Asymmetric encryption: the resolver path and filename(s)
    of the public key pem file(s), formatted as
    `<group>/<project>/files/<filename>`, with multiple key files separated by
    ', ' (E.g.
    `flywheel/test/files/public_key1.pem, flywheel/test/files/public_key2.pem`)
    if the key is saved at the project level, or
    `<group>/<project>/<subject>/files/<filename>`
    if stored at the subject level, or
    `<group>/<project>/<subject>/<session>/<acquisition>/<filename>`
    if stored within an acquisition container.

* _secret_key_
  * __Name__: secret_key
  * __Type__: string
  * __Description__: Symmetric encryption: the resolver path and filename(s) of
    the secret key txt file(s), formatted as
    `<group>/<project>/files/<filename>` (E.g.
    `flywheel/test/files/secret_key.txt`) if the key is saved at the project
    level, or `<group>/<project>/<subject>/files/<filename>` if stored at the
    subject level, or
    `<group>/<project>/<subject>/<session>/<acquisition>/<filename>` if stored
    within an acquisition container.

## Usage

1. User uploads or identifies a file in Flywheel to deidentify
2. User runs deid-inplace (this utility gear) at the project, subject, or session
   level and provides the following:
    * Files:
        * A de-identification profile specifying how to
          de-identify/anonymize each file type
        * an optional csv that contains a column that maps to a
          Flywheel session or subject metadata field and columns that
          specify values with which to replace DICOM header tags
        * The desired input file to de-identify
    * Configuration options:
        * delete-original: True/False

3. The gear will deidentify the file
4. The gear will erase or overwrite the original file depending on the config option.

### Environment

This gear uses `poetry` as a virtual environment and dependency manager you can interact
with the gear using the following:

1. [Install poetry](https://python-poetry.org/docs/#installation)
2. Install dependencies (from within gear directory): `poetry install`
3. Enter virtual environment: `poetry shell`

