This report provides statistics for all major pre-processing and filtering steps performed by the pipeline.
All charts are interactive so hovering over areas of interest will provide additional information.
Statistics report generated on 2021-09-01 12:31:05.438138
Fastq files (after partitioning) are examined for fragments (R1 + R2) that appear to be PCR duplicates.
Duplicates are identified by comparing the concatenated R1 and R2 sequences and filtering out exact matches.
This is only the first pass of PCR duplicate removal as single base changes will be ignored. The aim here is to remove as many duplicate fragments as possible to reduce the amount of downstream processing required.
Approximately 5-20% of fragments are typically removed by this step.
Following initial PCR duplicate removal fastq files are trimmed to remove sequencing adapters.
These plots provide a brief summary of the number of adapters identified and removed.
After the removal of adapters read pairs are combined (if any overlap exists) using FLASh to generate combined fragments (refered to as flashed). Non-combined read pairs that do not have a sufficient overlap (refered to as paired-end or pe) are maintained as read pairs in separate fastq files.
Following read pair combination, the combined or non-combined fragments are examined for recognition sites of the restriction enzyme used for the assay. A valid digesion of a fragment (above the minimum threshold set) results in one or more restriction fragments, refered to as slices.
Flashed read pairs are treated differently from paired-end read pairs as we expect to observe the ligation junction in the flashed fragment. Therefore, if no recognition sites are identified, the fragment is marked as invalid and is discarded. Non-combined (paired-end) reads are unlikely to contain the ligation junction and therefore if no restriction sites are identified, the individual read pairs are not discarded.
All identified slices must be longer than the minimum length specified (default 18 bp) to be considered valid.
Unfiltered read pairs = The number of read pairs containing at least one restriction site
Filtered read pairs = The number of read pairs containing at least one restriction site and at least one slices is above the minimum length
This plot shows the number of valid slices identified per fragment, separated by flashed status. For the PE reads, an undigested read is considered valid therefore all PE reads with > 1 slice contain a recognition site.
After alignment to the reference genome and annotation with capture probes, excluded regions and restriction fragments. Aligned slices are filtered and all fragments that do not contain one capture slice and one or more reporter slice(s) (i.e. slices that are not captured or appear in excluded regions) are removed.
This chart shows the number of read pairs removed at each stage of the filtering, split by flashed/pe status.
This chart shows the number of cis (same chromosome as capture) or trans (different chromosome to capture) reporters identified. This is separated by capture probe.
This chart displays the combined statistics from the entire pipeline run summarised at the read pair level.