Pre-processing data to use with itpseq#
Pre-requisite#
The pre-requisite to use itpseq is to have one fastq file per replicate.
The fastq files are text files with four lines per read:
@E00490:646:H7HGCCCX2:8:1101:5203:2206 1:N:0:CACGAT
GTATAAGGAGGAAAAAATATGCCAGCATCCGTCTAGTCCACGCGGTATCTCGGTGTGACTGACTGAGAATTTCTGTAGGCACCATCAAT
+
\V\MaMDRe``jVeVo``JJV\`VRNVNM9\eHf`RIRjkaV\\fjoo\eoaojjssoVosVVkaaVsssssVsfojossee`j\jeeM
If paired-end sequencing was performed, it is necessary to merge the files.
To be able to automatically import the files in a
DataSet, it is best to follow a naming convention.
Merging paired-end sequencing files#
In case of paired-end sequencing, it is first necessary to pair the two reads. This can be done using tools such as PEAR or Pandaseq.
For instance, with PEAR, assuming a forward nnn15_noa1_1.fastq and reverse
nnn15_noa1_2.fastq, those can be merged into a single
nnn15_noa1.assembled.fastq using:
pear -u0 -c126 -f nnn15_noa1_1.fastq -r nnn15_noa1_2.fastq -o nnn15_noa1
After pairing of the reads, a dataset with a control (noa) and an
antibiotic-treated condition (tcx), with 3 replicates each might look like:
nnn15_noa1.assembled.fastq
nnn15_noa2.assembled.fastq
nnn15_noa3.assembled.fastq
nnn15_tcx1.assembled.fastq
nnn15_tcx2.assembled.fastq
nnn15_tcx3.assembled.fastq
Naming conventions#
itpseq relies on filenames to match and load data files. Although not
strictly required we recommend to use filenames without spaces and with short
descriptors of the sample properties and groups.
For instance, we use nnn15 as a prefix to indicate that a random NNN15
library was used. In case of an antibiotic treatment vs no-antibiotic control we
use noa for the control and a short identifier such as tcx for the
antibiotic-treated samples. We use a number to indicate the replicate. A
filename for the second replicate of an erythromycin-treated sample, with a
NNN15 library might be named nnn15_ery2.assembled.fastq.
During the analysis, itpseq can combine automatically the replicates into a
Sample, and assign a reference to it. See Loading iTP-Seq data for more
details.