Quick start

gemini is designed to allow researchers to explore genetic variation contained in a VCF file. The basic workflow for working with gemini is outlined below.

Importing VCF files into gemini.

Before we can use GEMINI to explore genetic variation, we must first load our VCF file into the GEMINI database framework. We expect you to have first annotated the functional consequence of each variant in your VCF using either VEP or snpEff (Note that v3.0+ of snpEff is required to track the amino acid length of each impacted transcript). Logically,the loading step is done with the gemini load command. Below are two examples based on a VCF file that we creatively name my.vcf. The first example assumes that the VCF has been pre-annotated with VEP and the second assumes snpEff.

# VEP-annotated VCF
$ gemini load -v my.vcf -t VEP my.db

# snpEff-annotated VCF
$ gemini load -v my.vcf -t snpEff my.db

Assuming you have a valid VCF file produced by standard variation discovery programs (e.g., GATK, FreeBayes, etc.), one loads the VCF into the gemini framework with the load submodule:

$ gemini load -v my.vcf my.db

In this step, gemini reads and loads the my.vcf file into a SQLite database named my.db, whose structure is described here. While loading the database, gemini computes many additional population genetics statistics that support downstream analyses. It also stores the genotypes for each sample at each variant in an efficient data structure that minimizes the database size.

Loading is by far the slowest aspect of GEMINI. Using multiple CPUs can greatly speed up this process.

$ gemini load -v my.vcf --cores 8 my.db

Querying the gemini database.

If you are familiar with SQL, gemini allows you to directly query the database in search of interesting variants via the -q option. For example, here is a query to identify all novel, loss-of-function variants in your database:

$ gemini query -q "select * from variants where is_lof = 1 and in_dbsnp = 0" my.db

Or, we can ask for all variants that substantially deviate from Hardy-Weinberg equilibrium:

$ gemini query -q "select * from variants where hwe < 0.01" my.db
comments powered by Disqus

Table Of Contents

Previous topic

Installation

Next topic

Annotation with snpEff or VEP

This Page

Edit and improve this document!

This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.

  1. Go to Quick start on GitHub.
  2. Edit files using GitHub's text editor in your web browser (see the 'Edit' tab on the top right of the file)
  3. Fill in the Commit message text box at the bottom of the page describing why you made the changes. Press the Propose file change button next to it when done.
  4. Then click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format please see the reST primer.