Data Extraction is a two-step process:
1. File Validation
First, it is checked whether the file that you expect is included in the download.
This means that if the associated File Uploader expects a ZIP Upload, it tries to find the correct
file according to the file path you defined (this is skipped for single file uploads).
Next, it is checked whether the uploaded file has the expected format defined in the Expected File Format
setting (and other settings, depending on the file format).
Lastly, it is checked whether the identified file contains the expected fields
defined in the Expected Fields setting.
If any of these validation steps fail, the participant will be shown an
exception message explaining what went wrong and the file upload and extraction is aborted.
2. Data Extraction
For the data extraction, the Data Donation Module follows the data sparsity paradigm.
This means that the base assumption is, that you do not want any data from your participants,
and you have to explicitly indicate which data fields you want to have included.
To keep data in the data donation, you must define Extraction Rules.
An Extraction Rule is always related to one field/column in the uploaded data file
and a data field will only be kept in a participant's donation if it is explicitly
mentioned in at least one of the extraction rules.
An extraction rule can either indicate to just keep a field in the donation
(by mentioning the field/column in an extraction rule without defining any concrete comparison operator),
use data contained in a field to delete data entries (i.e., rows) from the donation
(e.g., to delete all entries where the date is < 01.01.2020) or
alter the data contained in a field (e.g., anonymize an e-mail address by replacing "name@mail.com" with "EMAIL").
For this, there are several comparison and regex operations available. For the comparison operations, a match
means that a data entry will be deleted. The rules are applied to the uploaded file in the indicated order.
{{ formset.management_form }}