Using the GEMINI API

The GeminiQuery class

class gemini.GeminiQuery(db, include_gt_cols=False, out_format='default')

An interface to submit queries to an existing Gemini database and iterate over the results of the query.

We create a GeminiQuery object by specifying database to which to connect:

from gemini import GeminiQuery
gq = GeminiQuery("my.db")

We can then issue a query against the database and iterate through the results by using the run() method:

for row in gq:
    print row

Instead of printing the entire row, one access print specific columns:

gq.run("select chrom, start, end from variants")
for row in gq:
    print row['chrom']

Also, all of the underlying numpy genotype arrays are always available:

gq.run("select chrom, start, end from variants")
for row in gq:
    gts = row.gts
    print row['chrom'], gts
    # yields "chr1" ['A/G' 'G/G' ... 'A/G']

The run() methods also accepts genotype filter:

query = "select chrom, start, end" from variants"
gt_filter = "gt_types.NA20814 == HET"
gq.run(query)
for row in gq:
    print row

Lastly, one can use the sample_to_idx and idx_to_sample dictionaries to gain access to sample-level genotype information either by sample name or by sample index:

# grab dict mapping sample to genotype array indices
smp2idx = gq.sample_to_idx

query  = "select chrom, start, end from variants"
gt_filter  = "gt_types.NA20814 == HET"
gq.run(query, gt_filter)

# print a header listing the selected columns
print gq.header
for row in gq:
    # access a NUMPY array of the sample genotypes.
    gts = row['gts']
    # use the smp2idx dict to access sample genotypes
    idx = smp2idx['NA20814']
    print row, gts[idx]
run(query, gt_filter=None, show_variant_samples=False, variant_samples_delim=', ', predicates=None, needs_genotypes=False)

Execute a query against a Gemini database. The user may specify:

  1. (reqd.) an SQL query.
  2. (opt.) a genotype filter.
header

Return a header describing the columns that were selected in the query issued to a GeminiQuery object.

sample2index

Return a dictionary mapping sample names to genotype array offsets:

gq = GeminiQuery("my.db")
s2i = gq.sample2index

print s2i['NA20814']
# yields 1088
index2sample

Return a dictionary mapping sample names to genotype array offsets:

gq = GeminiQuery("my.db")
i2s = gq.index2sample

print i2s[1088]
# yields "NA20814"
comments powered by Disqus

Table Of Contents

Previous topic

The Gemini database schema

Next topic

Acknowledgements

This Page

Edit and improve this document!

This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.

  1. Go to Using the GEMINI API on GitHub.
  2. Edit files using GitHub's text editor in your web browser (see the 'Edit' tab on the top right of the file)
  3. Fill in the Commit message text box at the bottom of the page describing why you made the changes. Press the Propose file change button next to it when done.
  4. Then click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format please see the reST primer.