Cross Referencing Guides to CRISPRs
The Index stage of the pipeline will create a binary file containing the gRNA guides. In order to cross reference the guides to the CRISPRs found in the Gather stage it is recommended to use a database as Search and Align will only give a CRISPR ID. For most applications this will be a SQLite database (as it is read-only) but other databases can be used.
To set up an SQLite database, first install SQLite3 and then create a database file:
sqlite3 crispr.db
we will use a table to hold the CRISPRs data as follows:
CREATE TABLE crisprs (
id integer primary key,
chr_name text,
chr_start integer,
seq text,
pam_right integer,
);
We include a script to automate this for you using SQLite3:
python scripts/index_database.py -d crispr.db \
-i chromosome.1.csv \
-i chromosome.2.csv \
-i chromosome.3.csv \
...
Note that the offset is set to 0 by default, if you have set a different offset in the Index stage then you will need to set the offset with the -o flag.
Also note that the sequence with which the -i flag is used determines the order of the importation of the CRISPRs into the database.