Installation

Automated installation

GEMINI contains an automated installation script which installs GEMINI along with required Python dependencies, third party software and data files.

$ wget https://raw.github.com/arq5x/gemini/master/gemini/scripts/gemini_install.py

If wget isn’t available, you can use curl instead:

$ curl \
https://raw.github.com/arq5x/gemini/master/gemini/scripts/gemini_install.py \
> gemini_install.py

Once you have downloaded the above installation script, you can proceed as follows:

$ python gemini_install.py /usr/local /usr/local/share/gemini
$ export PATH=$PATH:/usr/local/gemini/bin
# it would be wise to add the above line to your ``.bashrc`` or ``.bash_profile``

This installs the GEMINI executable as /usr/local/bin/gemini, other required third party dependencies in /usr/local/bin, and associated data files in /usr/local/share/gemini. Please note that this is merely an example: one can place the GEMINI executables and annotation files in any directories one wishes.

Note

The automated installtion script typically requires ~30 minutes, primarily owing to the time required to download the GEMINI genome annotation files. Also, please note that the annotation files requires ~15Gb of storage, so please ensure that the data directory (/usr/local/share/gemini in the example above) has sufficient space.

Tip

Some tips and tricks for installation issues:

  1. Some older versions of wget have certificate problems with GitHub files. If you run into this problem, you can alternatively download the install script using``wget –no-check-certificates`` or curl -O.
  2. The installation script is idempotent and you can re-run it multiple times without any issues. If you experience internet connectivity or other transient errors during installation, a re-run can often solve the problem (fingers crossed).

Dependencies

The installer requires:

  • Python 2.7.x
  • git
  • a working C / C++ compiler such as gcc
  • zlib (including headers)

These dependencies already exist on most UNIX/LINUX/OS X machines. However, on minimalist machines such as fresh Amazon Cloud instances, one may have to install these dependencies before running the automated installer. In the case of Amazon instances, the following command should take care of all of the above dependencies:

$ sudo yum -y install python27 git gcc gcc-c++ zlib-devel

Installing without root access.

As many users do not have root or sudo access, the automated installer also has options to install in “non-root” environments:

$ python gemini_install.py ~/gemini ~/gemini --nosudo

Updating your PATH to find the GEMINI executable

At this point, you will have a self-contained installation of GEMINI, including both the software and its associated genome annotations. However, if you have done a custom install in a “non-root” enviornment, you will first need to update your PATH environment variable to include the path to the bin directory that you just created by running the automated installer.

For example, if, as above, you placed you custom install in ~/gemini, you would need to update your PATH as follows. It would be wise to also add this to your .bashrc or .bash_profile:

$ export PATH=$PATH:~/gemini/bin

Note that this change will only last for the life of your current terminal session. To make this more permanent, update your .bash_profile so that this change is made each time you login.

If successful, you should be able to run the following command from anywhere on your system:

$ gemini -v
gemini 0.3.0b

Running unit tests

If successfully installed, you should be able to change directories into the gemini directory within the directory into which you installed the GEMINI source code and run a script of unit tests that will ensure that GEMINI is running appropriately on your system. For example, if, as above, you installed the GEMINI executables to ~/gemini, you would issue the following commands to run the unit tests:

$ cd ~/gemini/gemini
$ bash master-test.sh

Updating the GEMINI executables and annotations

Once installed with the automated installer, it is easy to upgrade the GEMINI programs and associated data files to the latest released version with:

$ gemini update

Software dependencies

GEMINI depends upon several widely-used genomics command line software as well as multiple Python packages. We recognize that the dependency stack is quite deep and are working on ways to minimize dependencies in the interest of the most streamlined installation process possible. Nonetheless, the following are core dependencies:

Manual installation

Once the above dependencies have been installed, one can begin installing GEMINI itself. To install you should download the latest source code from GitHub, either by going to:

http://github.com/arq5x/gemini

and clicking on “Downloads”, or by cloning the git repository with:

$ git clone https://github.com/arq5x/gemini.git

Once you have the source code, run:

$ cd gemini
$ sudo python setup.py install

to install it. If you don’t have permission to install it in the default directory, you can simply build the source in-place and use the package from the git repository:

$ python setup.py build_ext --inplace

Installing annotation files

One of the more appealing features in GEMINI is that it automatically annotates variants in a VCF file with several genome annotations. However, you must first install these data files on your system. It’s easy enough — you just need to run the following script and tell it in which what full path you’d like to install the necessary data files. The recommended path is /usr/local/share, but you can install the data files wherever you want.

$ python gemini/install-data.py /usr/local/share/

Using previously installed annotation files

If you have installed GEMINI with the annotation files on a server and you can NFS mount the annotation files, you can tell a local install of GEMINI where those annotation files are by making the file ~/.gemini/gemini-config.yaml:

annotation_dir: /path/to/nfs_mounted/gemini/data
versions:
  GRCh37-gms-mappability.vcf.gz: 2
  hg19.rmsk.bed.gz: 2

Running the testing suite

GEMINI comes with a full test suite to make sure that everything has installed correctly on your system. We strongly encourage you to run these tests.

$ bash master-test.sh

GEMINI depends upon external tools to predict the functional consequence of variants in a VCF file. We currently support annotations produced by both SnpEff and VEP. Recommended instructions for annotating existing VCF files with these tools are available here. In addition, we have attempted to standardize the terms used to describe the functional consequence of a given variant, as each annotation tool uses different vocabulary.

The variant consequence columns in the variant table are populated either by snpEff or VEP as defined by the user using the -t option while running gemini load (To populate these columns the input VCF file should have been annotated either by snpEff or VEP):

$ gemini load -v my.vcf -t VEP -d my.db
$ gemini load -v my.vcf -t snpEFF -d my.db

By default the following columns in the variant table would be set to null:

  • anno_id
  • gene
  • affected_gene
  • affected_transcript
  • affected_exon
  • is_exonic
  • is_lof
  • is_coding
  • codon_change
  • aa_change
  • aa_length
  • biotype
  • most_severe_impact
  • impact_severity
  • polyphen_pred
  • polyphen_score
  • sift_pred
  • sift_score

The table below shows the alternate GEMINI terms for the consequences from snpEff and VEP, for SQL queries. The last column represents the severity terms associated with the impacts:

Gemini terms snpEff terms VEP terms Impact severity
splice_acceptor SPLICE_SITE_ACCEPTOR splice_acceptor_variant HIGH
splice_donor SPLICE_SITE_DONOR splice_donor_variant HIGH
stop_gain STOP_GAINED stop_gained HIGH
stop_loss STOP_LOST stop_lost HIGH
frame_shift FRAME_SHIFT frameshift_variant HIGH
start_loss START_LOST null HIGH
exon_deleted EXON_DELETED null HIGH
non_synonymous_start NON_SYNONYMOUS_START null HIGH
transcript_codon_change null initiator_codon_variant HIGH
non_syn_coding NON_SYNONYMOUS_CODING missense_variant MED
inframe_codon_gain CODON_INSERTION inframe_insertion MED
inframe_codon_loss CODON_DELETION inframe_deletion MED
inframe_codon_change CODON_CHANGE null MED
codon_change_del CODON_CHANGE_PLUS_CODON_DELETION null MED
codon_change_ins CODON_CHANGE_PLUS_CODON_INSERTION null MED
UTR_5_del UTR_5_DELETED null MED
UTR_3_del UTR_3_DELETED null MED
other_splice_variant null splice_region_variant MED
mature_miRNA null mature_miRNA_variant MED
regulatory_region null regulatory_region_variant MED
TF_binding_site null TF_binding_site_variant MED
regulatory_region_ablation null regulatory_region_ablation MED
regulatory_region_amplification null regulatory_region_amplification MED
TFBS_ablation null TFBS_ablation MED
TFBS_amplification null TFBS_amplification MED
synonymous_stop SYNONYMOUS_STOP stop_retained_variant LOW
synonymous_coding SYNONYMOUS_CODING synonymous_variant LOW
UTR_5_prime UTR_5_PRIME 5_prime_UTR_variant LOW
UTR_3_prime UTR_3_PRIME 3_prime_UTR_variant LOW
intron INTRON intron_variant LOW
CDS CDS coding_sequence_variant LOW
upstream UPSTREAM upstream_gene_variant LOW
downstream DOWNSTREAM downstream_gene_variant LOW
intergenic INTERGENIC, INTERGENIC_CONSERVED intergenic_variant LOW
intragenic INTRAGENIC null LOW
gene GENE null LOW
transcript TRANSCRIPT null LOW
exon EXON null LOW
start_gain START_GAINED null LOW
synonymous_start SYNONYMOUS_START null LOW
intron_conserved INTRON_CONSERVED null LOW
nc_transcript null nc_transcript_variant LOW
NMD_transcript null NMD_transcript_variant LOW
incomplete_terminal_codon null incomplete_terminal_codon_variant LOW
nc_exon null non_coding_exon_variant LOW
transcript_ablation null transcript_ablation LOW
transcript_amplification null transcript_amplification LOW
feature elongation null feature elongation LOW
feature truncation null feature truncation LOW

Note: “null” refers to the absence of the corresponding term in the alternate database

comments powered by Disqus

Edit and improve this document!

This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.

  1. Go to Installation on GitHub.
  2. Edit files using GitHub's text editor in your web browser (see the 'Edit' tab on the top right of the file)
  3. Fill in the Commit message text box at the bottom of the page describing why you made the changes. Press the Propose file change button next to it when done.
  4. Then click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format please see the reST primer.