pygenome package

Submodules

pygenome.saccharomyces_cerevisiae module

This module provides access to the Saccharomyces cerevisiae genome from Python. Sequences can be accessed as Bio.SeqRecord objects provided by Biopython.

class pygenome.saccharomyces_cerevisiae.pretty_str

Bases: str

This function provide nicer output of strings in the IPython shell

class pygenome.saccharomyces_cerevisiae.saccharomyces_cerevisiae_genome
bidirectional(gene)

Returns True if a gene is not expressed in the same direction on the chromosome as the gene immediatelly upstream.

::

Tandem genes:

Gene1 Gene2 —-> —->

bidirectional genes:

Gene1 Gene2 —-> <—-

Gene1 Gene2 <—- —->

gene : str
standard name (eg. CYC1 or TDH3)
out : bool
Boolean; True or False
cache = <percache.Cache object at 0x7f79e3f8d310>
cds(gene)

Returns the coding sequence assciated with a standard name (eg. CYC1) or a systematic name (eg. YJR048W).

gene : str
standard name (eg. CYC1) or a systematic name (eg. YJR048W)
out : Bio.SeqRecord
Bio.SeqRecord object

cds_genbank_accession cds_pydna_code

>>> from pygenome import sg
>>> sg.cds("TDH3")
SeqRecord(seq=Seq('ATGGTTAGAGTTGCTATTAACGGTTTCGGTAGAATCGGTAGATTGGTCATGAGA...TAA', IUPACAmbiguousDNA()), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=[])
>>> len(sg.cds("TDH3"))
999
>>> sg.cds("YJR048W")
SeqRecord(seq=Seq('ATGACTGAATTCAAGGCCGGTTCTGCTAAGAAAGGTGCTACACTTTTCAAGACT...TAA', IUPACAmbiguousDNA()), id='BK006943.2', name='BK006943', description='TPA: Saccharomyces cerevisiae S288c chromosome X.', dbxrefs=[])
>>> len(sg.cds("YJR048W"))
330
>>>
cds_genbank_accession(gene)

Same as the cds method, but returns a string representing a portion of a Genbank file.

>>> from pygenome import sg
>>> sg.cds_genbank_accession("TDH3")
'BK006941.2 REGION: complement(882812..883810)'
>>>
cds_pydna_code(gene)
chromosome(id)

Returns the chromosome associated with the number id

id : int or str
chromosome number (1-16) or (“A”-“P”)
out : Bio.SeqRecord
Bio.SeqRecord object

chromosomes

Some of the yeast chromosomes return large sequences:

——- ——–
chr size(bp)

——- ——– chr A 230218 chr B 316620 chr C 813184 chr D 576874 chr E 1531933 chr F 1090940 chr G 270161 chr H 439888 chr I 562643 chr J 666816 chr K 745751 chr L 924431 chr M 1078177 chr N 1091291 chr O 784333 ——- ——-

>>> from pygenome import sg
>>> len(sg.chromosome(1))
230218
>>> sg.chromosome(1)
SeqRecord(seq=Seq('CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACA...GGG', IUPACAmbiguousDNA()), id='BK006935.2', name='BK006935', description='TPA: Saccharomyces cerevisiae S288c chromosome I.', dbxrefs=[])
>>> len(sg.chromosome(16))
948066
>>> len(sg.chromosome("A"))
230218
>>>
chromosomes()

Returns a generator containing all yeast chromosomes in the form of Bio.SeqRecord objects

out : Generator
Generator of Bio.SeqRecord object

chromosome

>>> from pygenome import sg
>>> sg.chromosomes().next()
SeqRecord(seq=Seq('CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACA...GGG', IUPACAmbiguousDNA()), id='BK006935.2', name='BK006935', description='TPA: Saccharomyces cerevisiae S288c chromosome I.', dbxrefs=[])
>>>
data_dir = '/home/bjorn/.local/share/sgd_genome_data_files'
download(missing_files=None)

Download the sequence files from Saccharomyces Genome Database (www.sgd.org) This is typically only done once.

downstream_gene(gene)

Returns the coding sequence (cds) assciated with the gene downstream of gene. This is defined as the gene on the chromosome located 3’ of the transcription stop point of gene. The gene can be given as a standard name (eg. CYC1) or a systematic name (eg. YJR048W).

>>> from pygenome import sg
>>> sg.downstream_gene("RFA1")
'YAR003W'
>>> sg.systematic_name("RFA1")
'YAR007C'
>>> sg.downstream_gene("CYC3")
'YAL040C'
>>> sg.systematic_name("CYC3")
'YAL039C'
>>>
intergenic_sequence(*args, **kwargs)

Function wrapping the decorated function.

intergenic_sequence_genbank_accession(upgene, dngene)

Same as the cds method, but returns a string representing a portion of a Genbank file.

>>> from pygenome import sg
>>> sg.intergenic_sequence_genbank_accession("YGR192C", "YGR193C")
'BK006941.2 REGION: 883811..884508'
intergenic_sequence_pydna_code(upgene, dngene)
locus(*args, **kwargs)

Function wrapping the decorated function.

promoter(*args, **kwargs)

Function wrapping the decorated function.

promoter_genbank_accession(gene)

Same as the promoter_genbank method, but returns a string representing a portion of a Genbank file.

>>> from pygenome import sg
>>> sg.promoter_genbank_accession("TDH3")
'BK006941.2 REGION: complement(883811..884508)'
>>>
promoter_pydna_code(gene)
systematic_name(gene)

Returns the systematic name associated with a standard name.

gene : str
standard name (eg. CYC1 or TDH3)
out : str
String
>>> from pygenome import sg
>>> sg.systematic_name("GAL1")
'YBR020W'
>>> sg.systematic_name("CYC1")
'YJR048W'
>>> sg.systematic_name("TDH3")
'YGR192C'
>>> 
tandem(gene)

Returns True if a gene is expressed in the same direction on the chromosome as the gene immediatelly upstream.

::

Tandem genes:

Gene1 Gene2 —-> —->

bidirectional genes:

Gene1 Gene2 —-> <—-

gene : str
standard name (eg. CYC1 or TDH3)
out : bool
Boolean; True or False
terminator(*args, **kwargs)

Function wrapping the decorated function.

upstream_gene(gene)

Returns the coding sequence (cds) assciated with the gene upstream of gene. This is defined as the gene on the chromosome located 5’ of the transcription start point of gene. The gene can be given as a standard name (eg. CYC1) or a systematic name (eg. YJR048W).

>>> from pygenome import sg
>>> sg.systematic_name("RFA1")
'YAR007C'
>>> sg.upstream_gene("RFA1")
'YAR008W'
>>> sg.systematic_name("CYC3")
'YAL039C'
>>> sg.systematic_name("CYC3")
'YAL039C'
>>>

Module contents

Table Of Contents

This Page