pydna Package

dsdna Module

Provides two classes, Dseq and Dseqrecord, for handling double stranded DNA sequences. Dseq and Dseqrecord are subclasses of Biopythons Seq and SeqRecord classes, respectively. These classes support the notion of circular and linear DNA.

class pydna.dsdna.Dseq(watson, crick=None, ovhg=None, linear=None, circular=None, alphabet=IUPACAmbiguousDNA())[source]

Bases: Bio.Seq.Seq

Dseq is a class designed to hold information for a double stranded DNA fragment. Dseq also holds information describing the topology of the DNA fragment (linear or circular).

Dseq is a subclass of the Biopython Seq object. It stores two strings representing the watson (sense) and crick(antisense) strands. two properties called linear and circular, and a numeric value ovhg (overhang) describing the stagger for the watson and crick strand in the 5’ end of the fragment.

The most common usage is probably to create a Dseq object as a part of a Dseqrecord object (see Dseqrecord).

There are three ways of creating a Dseq object directly:

Only one argument (string):

>>> import pydna
>>> pydna.Dseq("aaa")
Dseq(-3)
aaa
ttt

The given string will be interpreted as the watson strand of a blunt, linear double stranded sequence object. The crick strand is created automatically from the watson strand.

Two arguments (string, string):

>>> import pydna
>>> pydna.Dseq("gggaaat","ttt")
Dseq(-7)
gggaaat
   ttt

If both watson and crick are given, but not ovhg an attempt will be made to find the best annealing between the strands. There are limitations to this! For long fragments it is quite slow. The length of the annealing sequences have to be at least half the length of the shortest of the strands.

Three arguments (string, string, ovhg=int):

The ovhg parameter controls the stagger at the five prime end:

ovhg=-2

XXXXX
  XXXXX

ovhg=-1

XXXXX
 XXXXX


ovhg=0

XXXXX
XXXXX

ovhg=1

 XXXXX
XXXXX

ovhg=2

  XXXXX
XXXXX

Example of creating Dseq objects with different amounts of stagger:

>>> pydna.Dseq(watson="agt",crick="actta",ovhg=-2)
Dseq(-7)
agt
  attca
>>> pydna.Dseq(watson="agt",crick="actta",ovhg=-1)
Dseq(-6)
agt
 attca
>>> pydna.Dseq(watson="agt",crick="actta",ovhg=0)
Dseq(-5)
agt
attca
>>> pydna.Dseq(watson="agt",crick="actta",ovhg=1)
Dseq(-5)
 agt
attca
>>> pydna.Dseq(watson="agt",crick="actta",ovhg=2)
Dseq(-5)
  agt
attca

the ovhg parameter has to be given with both watson and crick, otherwise an exception is raised.

>>> pydna.Dseq(watson="agt",ovhg=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna_/dsdna.py", line 169, in __init__
    else:
Exception: ovhg defined without crick strand!

The default alphabet is set to Biopython IUPACAmbiguousDNA

The shape of the fragment is set by either:

linear = False, True

or

circular = True, False

Note that both ends of the DNA fragment has to be blunt to set circular = True (or linear = False).

>>> pydna.Dseq("aaa","ttt")
Dseq(-3)
aaa
ttt
>>> pydna.Dseq("aaa","ttt",ovhg=0)
Dseq(-3)
aaa
ttt
>>> pydna.Dseq("aaa", "ttt", linear = False ,ovhg=0)
Dseq(o3)
aaa
ttt
>>> pydna.Dseq("aaa", "ttt", circular = True , ovhg=0)
Dseq(o3)
aaa
ttt

Coercing to string

>>> a=pydna.Dseq("tttcccc","aaacccc")
>>> a
Dseq(-11)
    tttcccc
ccccaaa
>>> str(a)
'ggggtttcccc'

The double stranded part is accessible through the dsdata property:

>>> a.dsdata
'ttt'

Dseqrecord and Dseq share the same concept of length

<-- length -->
GATCCTTT
     AAAGCCTAG

The slicing of a linear Dseq object works mostly as it does for a string.

>>> s="ggatcc"
>>> s[2:3]
'a'
>>> s[2:4]
'at'
>>> s[2:4:-1]
''
>>> s[::2]
'gac'
>>> import pydna
>>> d=pydna.Dseq(s, linear=True)
>>> d[2:3]
Dseq(-1)
a
t
>>> d[2:4]
Dseq(-2)
at
ta
>>> d[2:4:-1]
Dseq(-0)


>>> d[::2]
Dseq(-3)
gac
ctg

The slicing of a circular Dseq object has a slightly different meaning.

>>> s="ggAtCc"
>>> d=pydna.Dseq(s, circular=True)
>>> d
Dseq(o6)
ggAtCc
ccTaGg
>>> d[4:3]
Dseq(-5)
CcggA
GgccT

The slice [X:X] produces an empty slice for a string, while this will return the linearized sequence starting at X:

>>> s="ggatcc"
>>> d=pydna.Dseq(s, circular=True)
>>> d
Dseq(o6)
ggatcc
cctagg
>>> d[3:3]
Dseq(-6)
tccgga
aggcct
>>>
T4(nucleotides=None)[source]

Fill in of five prime protruding ends and chewing back of three prime protruding ends by a DNA polymerase providing both 5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty (such as T4 DNA polymerase). This in presence of any combination of A, G, C or T. Default are all four nucleotides together.

>>> import pydna
>>> a=pydna.Dseq("gatcgatc")
>>> a
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4()
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4("t")
Dseq(-8)
gatcgat
 tagctag
>>> a.T4("a")
Dseq(-8)
gatcga
  agctag
>>> a.T4("g")
Dseq(-8)
gatcg
   gctag
>>>
circular[source]

The circular property

cut(*enzymes)[source]

Returns a list of linear Dseq fragments produced in the digestion. If there is not cut, the whole sequence is returned.

>>> from pydna import Dseq
>>> seq=Dseq("ggatccnnngaattc")
>>> seq
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>> from Bio.Restriction import BamHI,EcoRI
>>> type(seq.cut(BamHI))
<type 'list'>
>>> for frag in seq.cut(BamHI):
...     print frag.fig()
Dseq(-5)
g
cctag
Dseq(-14)
gatccnnngaattc
    gnnncttaag
>>> seq.cut(EcoRI, BamHI) ==  seq.cut(BamHI, EcoRI)
True
>>> a,b,c = seq.cut(EcoRI, BamHI)
>>> a+b+c
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>>
fig()[source]

Returns a representation of the sequence, truncated if longer than 40 bp:

>>> import pydna
>>> a=pydna.Dseq("atcgcttactagcgtactgatcatctgactgactagcgtga")
>>> a
Dseq(-41)
atcg..gtga
tagc..cact
>>>
fill_in(nucleotides=None)[source]

Fill in of five prime protruding end with a DNA polymerase that has only DNA polymerase activity (such as exo-klenow) and any combination of A, G, C or T. Default are all four nucleotides together.

>>> import pydna
>>> a=pydna.Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.fill_in()
Dseq(-3)
aaa
ttt
>>> b=pydna.Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.fill_in()
Dseq(-5)
caaag
gtttc
>>> b.fill_in("g")
Dseq(-5)
caaag
gtttc
>>> b.fill_in("tac")
Dseq(-5)
caaa
 tttc
>>> b=pydna.Dseq("aaac", "tttg")
>>> c=pydna.Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.fill_in()
Dseq(-5)
 aaac
gttt
>>>
find(sub, start=0, end=9223372036854775807)[source]

Find method, like that of a python string.

This behaves like the python string method of the same name.

Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

Returns -1 if the subsequence is NOT found.

e.g. Locating the first typical start codon, AUG, in an RNA sequence:

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.find("AUG")
3
five_prime_end()[source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

>>> import pydna
>>> a=pydna.Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.five_prime_end()
('blunt', '')
>>> a=pydna.Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.five_prime_end()
("3'", 't')
>>> a=pydna.Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.five_prime_end()
("5'", 'a')
>>>
linear[source]

The linear property

looped()[source]

Returns a circularized Dseq object. This can only be done if the two ends are compatible, otherwise a TypeError is raised.

>>> import pydna
>>> a=pydna.Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> a.looped()
Dseq(o8)
catcgatc
gtagctag
>>> a.T4("t")
Dseq(-8)
catcgat
 tagctag
>>> a.T4("t").looped()
Dseq(o7)
catcgat
gtagcta
>>> a.T4("a")
Dseq(-8)
catcga
  agctag
>>> a.T4("a").looped()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna/dsdna.py", line 357, in looped
    if type5 == type3 and str(sticky5) == str(rc(sticky3)):
TypeError: DNA cannot be circularized.
5' and 3' sticky ends not compatible!
>>>
mung()[source]

Simulates treatment a nuclease with 5’-3’ and 3’-5’ single strand specific exonuclease activity (such as mung bean nuclease):

    ggatcc    ->     gatcc
     ctaggg          ctagg

     ggatcc   ->      ggatc
    tcctag            cctag

>>> import pydna
>>> b=pydna.Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.mung()
Dseq(-3)
aaa
ttt
>>> c=pydna.Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.mung()
Dseq(-3)
aaa
ttt
ovhg[source]

The ovhg property

rc()[source]

Alias of the reverse_complement method

reverse_complement()[source]

Returns a Dseq object where watson and crick have switched places.

t4(*args, **kwargs)[source]

Alias for the T4 method

three_prime_end()[source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

>>> import pydna
>>> a=pydna.Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.three_prime_end()
('blunt', '')
>>> a=pydna.Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.three_prime_end()
("3'", 'a')
>>> a=pydna.Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.three_prime_end()
("5'", 't')
>>>
tolinear()[source]

Returns a blunt, linear copy of a circular Dseq object. This can only be done if the Dseq object is circular, otherwise a TypeError is raised.

>>> import pydna
>>> a=pydna.Dseq("catcgatc", circular=True)
>>> a
Dseq(o8)
catcgatc
gtagctag
>>> a.tolinear()
Dseq(-8)
catcgatc
gtagctag
>>>
class pydna.dsdna.Dseqrecord(record, circular=None, linear=None, *args, **kwargs)[source]

Bases: Bio.SeqRecord.SeqRecord

Dseqrecord is a double stranded version of the Biopython SeqRecord class. The Dseqrecord object holds a Dseq object describing the sequence. Additionally, Dseqrecord hold meta information about the sequence in the from of a list of SeqFeatures, in the same way as the SeqRecord does. The Dseqrecord can be initialized with a string, Seq, Dseq, SeqRecord or another Dseqrecord. The sequence information will be stored in a Dseq object in all cases. Dseqrecord objects can be read or parsed from sequences in Fasta, Embl or Genbank format.

There is a short representation associated with the Dseqrecord. Dseqrecord(-3) represents a linear sequence of length 2 while Dseqrecord(o7) represents a circular sequence of length 7.

Dseqrecord and Dseq share the same concept of length

<-- length -->
GATCCTTT
     AAAGCCTAG
record : string, Seq, SeqRecord, Dseq or other Dseqrecord object
This data will be used to form the seq property
circular : bool, optional
True or False reflecting the shape of the DNA molecule
linear : bool, optional
True or False reflecting the shape of the DNA molecule
>>> from pydna import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> a.seq
Dseq(-3)
aaa
ttt
>>> from Bio.Seq import Seq
>>> b=Dseqrecord(Seq("aaa"))
>>> b
Dseqrecord(-3)
>>> b.seq
Dseq(-3)
aaa
ttt
>>> from Bio.SeqRecord import SeqRecord
>>> c=Dseqrecord(SeqRecord(Seq("aaa")))
>>> c
Dseqrecord(-3)
>>> c.seq
Dseq(-3)
aaa
ttt
>>> a.seq.alphabet
IUPACAmbiguousDNA()
>>> b.seq.alphabet
IUPACAmbiguousDNA()
>>> c.seq.alphabet
IUPACAmbiguousDNA()
>>>
circular[source]

Not really a method, but the circular property

cut(*enzymes)[source]

Digest the Dseqrecord object with one or more restriction enzymes. returns a list of linear Dseqrecords.

enzymes : iterable object
iterable containing Biopython restriction enzyme objects
fragments : list
list of Dseqrecord objects formed by the digestion
>>> import pydna
>>> a=pydna.Dseqrecord("ggatcc")
>>> from Bio.Restriction import BamHI
>>> a.cut(BamHI)
[Dseqrecord(-5), Dseqrecord(-5)]
>>> frag1, frag2 = a.cut(BamHI)
>>> frag1.seq
Dseq(-5)
g
cctag
>>> frag2.seq
Dseq(-5)
gatcc
    g
format(f='gb')[source]

Returns the sequence as a string using a format supported by Biopython SeqIO. Default is “gb” which is short for Genbank.

>>> import pydna
>>> a=pydna.Dseqrecord("aaa")
>>> a.annotations['date'] = '02-FEB-2013'
>>> a
Dseqrecord(-3)
>>> print a.format()
LOCUS       .                          3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  .
ACCESSION   <unknown id>
VERSION     <unknown id>
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
linear[source]

Not really a method, but the linear property

looped()[source]

Returns a circular version of the Dseqrecord object. The underlying Dseq object has to have compatible ends.

>>> import pydna
>>> a=pydna.Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> b=a.looped()
>>> b
Dseqrecord(o3)
>>>
rc()[source]

alias of the reverse_complement method

reverse_complement()[source]

Returns a new Dseqrecord object which is the reverse complement.

>>> import pydna
>>> a=pydna.Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>
seguid()[source]

Returns the SEGUID for the sequence

>>> import pydna
>>> a=pydna.Dseqrecord("aaa")
>>> a.seguid()
'YG7G6b2Kj/KtFOX63j8mRHHoIlE'
shifted(shift)[source]

Returns a circular Dseqrecord with a new origin <shift>. This only works on circular Dseqrecords. If we consider the following circular sequence:

GAAAT   <-- watson strand
CTTTA   <-- crick strand

The T and the G on the watson strand are linked together as well as the A and the C of the of the crick strand.

if shift is 1, this indicates a new origin at position 1:

new origin

G|AAAT
C|TTTA

new sequence:

AAATG
TTTAC

Shift is always positive and 0<shift<length, so in the example below, permissible values of shift are 1,2 and 3

>>> import pydna
>>> a=pydna.Dseqrecord("aaat",circular=True)
>>> a
Dseqrecord(o4)
>>> a.seq
Dseq(o4)
aaat
ttta
>>> b=a.shifted(1)
>>> b
Dseqrecord(o4)
>>> b.seq
Dseq(o4)
aata
ttat
stamp()[source]

Adds a seguid stamp to the description property. This will show in the genbank format. The following string:

SEGUID <seguid>

will be appended to the description property of the Dseqrecord object (string).

>>> import pydna
>>> a=pydna.Dseqrecord("aaa")
>>> a.stamp()
>>> a.description
'<unknown description> SEGUID YG7G6b2Kj/KtFOX63j8mRHHoIlE'
>>> a.verify_stamp()
True
synced(ref, limit=25)[source]

This function returns a new circular sequence (Dseqrecord object), which has ben rotated in such a way that there is maximum overlap between the sequence and ref, which may be a string, Biopython Seq, SeqRecord object or another Dseqrecord object.

The reason for using this could be to rotate a recombinant plasmid so that it starts at the same position after cloning. See the example below:

>>> import pydna
>>> a=pydna.Dseqrecord("gaat",circular=True)
>>> a.seq
Dseq(o4)
gaat
ctta
>>> d = a[2:] + a[:2]
>>> d.seq
Dseq(-4)
atga
tact
>>> insert=pydna.Dseqrecord("CCC")
>>> recombinant = (d+insert).looped()
>>> recombinant.seq
Dseq(o7)
atgaCCC
tactGGG
>>> recombinant.synced(a).seq
Dseq(o7)
gaCCCat
ctGGGta
tolinear()[source]

Returns a linear, blunt of a circular Dseqrecord object. The underlying Dseq object has to be circular.

>>> import pydna
>>> a=pydna.Dseqrecord("aaa", circular = True)
>>> a
Dseqrecord(o3)
>>> b=a.tolinear()
>>> b
Dseqrecord(-3)
>>>
verify_stamp()[source]

Verifies the SEGUID stamp in the description property is valid. True if stamp match the sequid calculated from the sequence. Exception raised if no stamp can be found.

>>> import pydna
>>> a=pydna.Dseqrecord("aaa")
>>> a.annotations['date'] = '02-FEB-2013'
>>> a.seguid()
'YG7G6b2Kj/KtFOX63j8mRHHoIlE'
>>> print a.format("gb")
LOCUS       .                          3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  .
ACCESSION   <unknown id>
VERSION     <unknown id>
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
>>> a.stamp()
>>> a
Dseqrecord(-3)
>>> print a.format("gb")
LOCUS       .                          3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  <unknown description> SEGUID YG7G6b2Kj/KtFOX63j8mRHHoIlE
ACCESSION   <unknown id>
VERSION     <unknown id>
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
>>> a.verify_stamp()
True
>>>
write(filename=None, f='gb')[source]

Writes the Dseqrecord to a file using the format f, which must be a format supported by Biopython SeqIO. Default is “gb” which is short for Genbank.

Filename is the path to the file where the sequece is to be written. The filename is optional, if it is not given, the description property (string) is used together with the format:

If obj is the Dseqrecord object, the default file name will be:

<obj.description>.<f>

If the filename already exists and the sequence it contains is different, a new file name will be used:

<obj.description>_NEW.<f>

pydna.dsdna.parse(data, filter=False, ds=True)[source]

This function returns all DNA sequences found in data. If no sequences are found, an empty list is returned. This is a greedy function, use carefully.

data : string or iterable
see below
filter : bool
if filter == True, sequences will be silently filtered for allowed characters (see docs for Dseqrecord)
ds : bool
Double stranded or single stranded DNA, Return “Dseqrecord” or “SeqRecord” objects.
list
contains Dseqrecord or SeqRecord objects

The data parameter is a string containing:

  1. an absolute path to a local file. The file will be read in text mode and parsed for EMBL, FASTA and Genbank sequences.
  2. an absolute path to a local directory. all files in the directory will be read and parsed as in 1.
  3. a string containing one or more sequences in EMBL, GENBANK, or FASTA format. Mixed formats are allowed.
  4. data can be a list or other iterable of 1 - 3
pydna.dsdna.rc(sequence)[source]

returns the reverse complement of sequence (string) accepts mixed DNA/RNA

pydna.dsdna.read(data, filter=False, ds=True)[source]

This function is similar the parse funtion but returns only the first sequence found.

data : string
see below
filter : bool
if filter == True, sequences will be silently filtered for allowed characters (see docs for Dseqrecord)
ds : bool
Double stranded or single stranded DNA, Return “Dseqrecord” or “SeqRecord” objects.
Dseqrecord
contains the first Dseqrecord or SeqRecord object parsed.

The data parameter is similar to the data parameter for parse.

parse

amplify Module

This module provides functions for PCR.

Primers with 5’ tails as well as inverse PCR on circular templates are handled correctly.

class pydna.amplify.Amplicon(forward_primer, reverse_primer, template, saltc=50, fc=1000, rc=1000)[source]

Bases: object

Holds information about a PCR reaction involving two primers and one template. This class is used by the Anneal class and is not meant to be instantiated directly.

forward_primer : SeqRecord(Biopython)
SeqRecord object holding the forward (sense) primer
reverse_primer : SeqRecord(Biopython)
SeqRecord object holding the reverse (antisense) primer
template : Dseqrecord
Dseqrecord object holding the template (circular or linear)
saltc : float, optional
saltc = monovalent cations (mM) (Na,K..) default value is 50mM This is used for Tm calculations.
fc : float, optional
primer concentration (nM) default set to 1000nM = 1µM This is used for Tm calculations.
rc : float, optional
primer concentration (nM) default set to 1000nM = 1µM This is used for Tm calculations.
detailed_figure()[source]
5gctactacacacgtactgactg3
 |||||||||||||||||||||| tm 52.6 (dbd) 58.3
5gctactacacacgtactgactg...caagatagagtcagtaaccaca3
3cgatgatgtgtgcatgactgac...gttctatctcagtcattggtgt5
                          |||||||||||||||||||||| tm 49.1 (dbd) 57.7
                         3gttctatctcagtcattggtgt5
figure:string
A string containing a text representation of the primers annealing on the template.

tm is the melting temperature (Tm) calculated according to SantaLucia 1998 [1] for each primer.

(dbd) is Tm calculation for enzymes with dsDNA binding domains like Pfu-Sso7d [2]. See [3] for more information.

[1]
  1. SantaLucia, “A Unified View of Polymer, Dumbbell, and Oligonucleotide DNA Nearest-neighbor Thermodynamics,” Proceedings of the National Academy of Sciences 95, no. 4 (1998): 1460.
[2]
  1. Nørholm, “A Mutant Pfu DNA Polymerase Designed for Advanced Uracil-excision DNA Engineering,” BMC Biotechnology 10, no. 1 (2010): 21, doi:10.1186/1472-6750-10-21.
[3]http://www.thermoscientificbio.com/webtools/tmc/
flankdn(flankdnlength=50)[source]

Returns a Dseqrecord object containing flankdnlength bases downstream of the reverse primer footprint. Truncated if the template is not long enough.

                               <---- flankdn ------>

                3actactgactatctTAATAA5
                 ||||||||||||||
acgcattcagctactgtactactgactatctatcgtacatgtactatcgtat
flankup(flankuplength=50)[source]

Returns a Dseqrecord object containing flankuplength bases upstream of the forward primer footprint, Truncated if the template is not long enough.

<--- flankup --->

          5TAATAAactactgactatct3
                 ||||||||||||||
acgcattcagctactgtactactgactatctatcg
pcr_product()[source]

Returns the PCR product as a Dseqrecord object. Primers are marked as SeqFeatures(Biopython) associated with the sequence.

pcr_program()[source]

Returns a string containing a text representation of two proposed PCR programs. The first program is adapted for Taq poymerase, while the second is adapted for Pfu-Sso7d.

Taq (rate 30 nt/s)
Three-step|         30 cycles     |      |SantaLucia 1998
94.0°C    |94.0°C                 |      |SaltC 50mM
__________|_____          72.0°C  |72.0°C|
04min00s  |30s  \         ________|______|
          |      \ 46.0°C/ 0min 1s|10min |
          |       \_____/         |      |
          |         30s           |      |4-8°C

Pfu-Sso7d (rate 15s/kb)
Three-step|          30 cycles   |      |Breslauer1986,SantaLucia1998
98.0°C    |98.0°C                |      |SaltC 50mM
__________|_____          72.0°C |72.0°C|Primer1C   1µM
00min30s  |10s  \ 61.0°C ________|______|Primer2C   1µM
          |      \______/ 0min 0s|10min |
          |        10s           |      |4-8°C
class pydna.amplify.Anneal(primers, template, homology_limit=13, max_product_size=15000)[source]

Bases: object

primers : iterable containing SeqRecord objects
Primer sequences 5’-3’.
template : Dseqrecord object
The template sequence 5’-3’.
homology_limit : int, optional
limit length of the annealing part of the primers.
max_product_size: int, optional
PCR products longer than max_product_size are not reported
amplicons: list
A list of Amplicon objects, one for each primer pair that may form a PCR product.
>>> import pydna
>>> template = pydna.Dseqrecord("tacactcaccgtctatcattatctactatcgactgtatcatctgatagcac")
>>> from Bio.SeqRecord import SeqRecord
>>> p1=pydna.read(">p1\ntacactcaccgtctatcattatc", ds = False)
>>> p2 = pydna.read(">p2\ncgactgtatcatctgatagcac", ds = False).reverse_complement()
>>> pydna.Anneal((p1,p2), template)
Anneal(amplicons = 1)
>>> pydna.Anneal((p1,p2), template).amplicons
[Amplicon(51)]
>>> amplicon_list = pydna.Anneal((p1,p2), template).amplicons
>>> amplicon = amplicon_list.pop()
>>> amplicon
Amplicon(51)
>>> print amplicon.detailed_figure()
5tacactcaccgtctatcattatc...cgactgtatcatctgatagcac3
                           |||||||||||||||||||||| tm 50.6 (dbd) 60.5
                          3gctgacatagtagactatcgtg5
5tacactcaccgtctatcattatc3
 ||||||||||||||||||||||| tm 49.4 (dbd) 58.8
3atgtgagtggcagatagtaatag...gctgacatagtagactatcgtg5
>>> prod = amplicon.pcr_product()
>>> prod.annotations['date'] = '02-FEB-2013'
>>> print prod
Dseqrecord
circular: False
size: 51
ID: 51bp U96+TO06Y6pFs74SQx8M1IVTBiY
Name: 51bp_PCR_prod
Description: Primers p1 <unknown name>
Number of features: 2
/date=02-FEB-2013
Dseq(-51)
taca..gcac
atgt..cgtg
>>>
featured_template()[source]

returns the template Dseqrecord object with the footprint of all annealing primers marked as SeqFeatures.

products()[source]

returns all pcr product as a list of Dseqrecords

report()[source]

This method is an alias of str(Annealobj). Returns a short string representation.

pydna.amplify.basictm(primer, *args, **kwargs)[source]

Returns the melting temperature (Tm) of the primer using the basic formula.

Tm = (wA+xT)*2 + (yG+zC)*4 assumed 50mM monovalent cations

w = number of A in primer
x = number of T in primer
y = number of G in primer
z = number of C in primer
primer : string
Primer sequence 5’-3’

tm : int

>>> from pydna.amplify import basictm
>>> basictm("ggatcc")
20
>>>
pydna.amplify.pcr(*args, **kwargs)[source]

pcr is a convenience function for Anneal to simplify its usage, especially from the command line. If more than one PCR product is formed, an exception is raised.

args is any iterable of sequences or an iterable of iterables of sequences. args will be greedily flattened.

args : iterable containing sequence objects
Several arguments are also accepted.
limit : int = 13, optional
limit length of the annealing part of the primers.

sequences in args could be of type:

string Seq SeqRecord Dseqrecord

The last sequence will be interpreted as the template all preceeding sequences as primers.

This is a powerful function, use with care!

product : Dseqrecord
a Dseqrecord object representing the PCR product. The direction of the PCR product will be the same as for the template sequence.
>>> import pydna
>>> template = pydna.Dseqrecord("tacactcaccgtctatcattatctactatcgactgtatcatctgatagcac")
>>> from Bio.SeqRecord import SeqRecord
>>> p1 = pydna.read(">p1\ntacactcaccgtctatcattatc", ds = False)
>>> p2 = pydna.read(">p2\ncgactgtatcatctgatagcac", ds = False).reverse_complement()
>>> pydna.pcr(p1, p2, template)
Dseqrecord(-51)
>>> pydna.pcr([p1, p2], template)
Dseqrecord(-51)
>>> pydna.pcr((p1,p2,), template)
Dseqrecord(-51)
>>>
pydna.amplify.tmbreslauer86(primer, dnac=500.0, saltc=50, thermodynamics=False)[source]

Returns the melting temperature (Tm) of the primer using the nearest neighbour algorithm. Formula and thermodynamic data is taken from Breslauer 1986. These data are no longer widely used.

Breslauer 1986, table 2 [1]

pair dH dS dG
AA/TT 9.1 24.0 1.9
AT/TA 8.6 23.9 1.5
TA/AT 6.0 16.9 0.9
CA/GT 5.8 12.9 1.9
GT/CA 6.5 17.3 1.3
CT/GA 7.8 20.8 1.6
GA/CT 5.6 13.5 1.6
CG/GC 11.9 27.8 3.6
GC/CG 11.1 26.7 3.1
GG/CC 11.0 26.6 3.1
primer : string
Primer sequence 5’-3’ in UPPERCASE

tm : float

[1]K.J. Breslauer et al., “Predicting DNA Duplex Stability from the Base Sequence,” Proceedings of the National Academy of Sciences 83, no. 11 (1986): 3746.
>>> from pydna.amplify import tmbreslauer86
>>> tmbreslauer86("ACGTCATCGACACTATCATCGAC")
64.28863985851899
pydna.amplify.tmbresluc(primer, primerc=500.0, saltc=50, thermodynamics=False)[source]

Returns the tm for a primer using a formula adapted to polymerases with a DNA binding domain.

primer : string
primer sequence 5’-3’
primerc : float
concentration (nM)
saltc : float, optional
Monovalent cation concentration (mM)
thermodynamics : bool, optional
prints details of the thermodynamic data to stdout. For debugging only.
tm : float
the tm of the primer
tm,dH,dS : tuple
tm and dH and dS used for the calculation
pydna.amplify.tmstaluc98(primer, dnac=50, saltc=50, **kwargs)[source]

Returns the melting temperature (Tm) of the primer using the nearest neighbour algorithm. Formula and thermodynamic data is taken from SantaLucia 1998. This implementation gives the same answer as the one provided by Biopython (See Examples).

Thermodynamic data used:

pair dH dS
AA/TT 7.9 22.2
AT/TA 7.2 20.4
TA/AT 7.2 21.3
CA/GT 8.5 22.7
GT/CA 8.4 22.4
CT/GA 7.8 21.0
GA/CT 8.2 22.2
CG/GC 10.6 27.2
GC/CG 9.8 24.4
GG/CC 8.0 19.9
primer : string
Primer sequence 5’-3’ in UPPERCASE
tm : float
tm of the primer
[1]
  1. SantaLucia, “A Unified View of Polymer, Dumbbell, and Oligonucleotide DNA Nearest-neighbor Thermodynamics,” Proceedings of the National Academy of Sciences 95, no. 4 (1998): 1460.
>>> from pydna.amplify import tmstaluc98
>>> from Bio.SeqUtils.MeltingTemp import Tm_staluc
>>> tmstaluc98("ACGTCATCGACACTATCATCGAC")
54.55597724052518
>>> Tm_staluc("ACGTCATCGACACTATCATCGAC")
54.555977240525124
>>>

primer_design Module

pydna.primer_design.cloning_primers(template, minlength=16, maxlength=29, fp=None, rp=None, fp_tail='', rp_tail='', target_tm=55.0, primerc=1000.0, saltc=50.0, formula='tmbresluc')[source]

This function can design one or two primers for PCR amplification of a given sequence. This function accepts a Dseqrecord object containing the template sequence and returns a pydna amplicon object.

The amplicon object contains the primers, a figure describing the how the primers anneal and two suggested PCR programmes.

template : Dseqrecord
a Dseqrecord object.
minlength : int, optional
Minimum length of the annealing part of the primer
maxlength : int, optional
Maximum length (including tail) for designed primers.
fp, rp : SeqRecord, optional
optional Biopython SeqRecord objects containing one primer each.
fp_tail, rp_tail : string, optional
optional tails to be added to the forwars or reverse primers
target_tm : float, optional
target tm for the primers
primerc, saltc : float, optional
limit is set to 25 by default.
formula : string

formula used for tm calculation this is the name of a function. built in options are:

tmbresluc basictm tmstaluc98 tmbreslauer86

These functions are imported from the pydna.amplify module, but can be substituted for some other custom mad function.

frecs, cp : tuple

frecs are the same Dseqrecords as given as arguments, but with the regions of homology added to the features.

cp is a list of Dseqrecords representing the circular products sorted by length (long -> short).

assembly Module

Provides functions for assembly of sequences by homologous recombination. Given a list of sequences (Dseqrecords), all sequences will be analyzed for overlapping regions of DNA (common substrings).

The assembly algorithm is based on forming a network where each overlapping sequence forms a node and intervening sequences form edges.

Then all possible linear or circular assemblies will be returned in the order of their length.

pydna.assembly.circular_assembly(form_rec_list, limit=25)[source]

Accepts a list of Dseqrecords and tries to assemble them into a circular assembly by homologous recombination based on shared regions of homology with a minimum length given by limit.

form_rec_list : list
a list of Dseqrecord objects.
limit : int, optional
limit is set to 25 by default.
frecs, cp : tuple

frecs are the same Dseqrecords as given as arguments, but with the regions of homology added to the features.

cp is a list of Dseqrecords representing the circular products sorted by length (long -> short).

pydna.assembly.linear_assembly(form_rec_list, limit=25)[source]

Accepts a list of Dseqrecords and tries to assemble them into a linear assembly by homologous recombination based on shared regions of homology with a minimum length given by limit.

form_rec_list : list
a list of Dseqrecord objects.
limit : int, optional
limit is set to 25 by default.
frecs, lp : tuple

frecs are the same Dseqrecords as given as arguments, but with the regions of homology added to the features.

lp is a list of Dseqrecords representing the linear products sorted by length (long -> short).

assembly2 Module

pydna.assembly2.fusion_pcr_assembly(dsrecs, limit=15)[source]

Accepts a list of Dseqrecords and tries to assemble them into linear assemblies based on shared regions of homology at the extremities of each fragment with a minimum length given by limit.

dsrecs : list
a list of Dseqrecord objects.
limit : int, optional
limit is set to 15 nucleotides by default.
linear: list

Each object in linear is a named tuple with the following fields:

Field Type Contains
result Dseqrecord Assembeled sequence
fragments list of Dseqrecords list of fragments
offsets list of integers list of cumulative offsets aligning each source fragment
overlap_sizes list of integers list of the length of each overlap joining the sequences

Source fragments are the double stranded blunt DNA fragments originally added as the dsrecs argument. The parwise overlaps found are added ar features to the source fragments.

Offsets is a list with the cumulative stagger between each fragment.

            <--6->        <--6->         overlap_sizes = [6,6]
                        
cggcggcgggccTGCCTC                      \
gccgccgcccggACGGAG                      | 
                                        |
            TGCCTCaccattgcAAAAAA        | fragments
            ACGGAGtggtaacgTTTTTT        |
                                        | 
                          AAAAAAcatcata | 
                          TTTTTTgtagtat /

<---------->
<----------------------->               offsets = [12,26]
pydna.assembly2.gibson_assembly(dsrecs, limit=15)[source]

Accepts a list of Dseqrecords and tries to assemble them into linear or circular assemblies based on shared regions of homology at the extremities of each fragment with a minimum length given by limit.

This is a simulation of the DNA assembly method described in:

Gibson DG, Young L, Chuang R-Y, Venter JC, Hutchison CA 3rd, Smith HO (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6:343–345. doi: 10.1038/nmeth.1318

dsrecs : list
a list of Dseqrecord objects.
limit : int, optional
limit is set to 15 nucleotides by default.
linear, circular : tuple

Linear and circular are two lists containing linear and circular assemblies.

Each object in linear or circular is a named tuple with the following fields:

Field Type Contains
result Dseqrecord Assembeled sequence
fragments list of Dseqrecords list of fragments
offsets list of integers list of cumulative offsets aligning each source fragment
overlap_sizes list of integers list of the length of each overlap joining the sequences

Source fragments are the double stranded blunt DNA fragments originally added as the dsrecs argument. The parwise overlaps found are added ar features to the source fragments.

Offsets is a list with the cumulative stagger between each fragment.

            <--6->        <--6->        overlap_sizes = [6,6]
            
cggcggcgggcc                            \
gccgccgcccggACGGAG                      | 
                                        |
            TGCCTCaccattgc              | sticky_fragments = [12,26]
                  tggtaacgTTTTTT        |
                                        | 
                          AAAAAAcatcata | 
                                gtagtat /
                                
<---------->
<------------------------>              offsets
pydna.assembly2.recombination_assembly(dsrecs, limit=25)[source]

Accepts a list of Dseqrecords and tries to assemble them into linear and circular assemblies based on shared regions of homology with a minimum length given by limit.

dsrecs : list
a list of Dseqrecord objects.
limit : int, optional
limit is set to 25 nucleotides by default.
linear, circular: tuple

Linear and circular are two lists containing linear and circular assemblies.

Each object in linear or circular is a named tuple with the following fields

Field Type Contains
result Dseqrecord Assembeled sequence
source_fragments list of Dseqrecords list of fragments
sticky_fragments list of Dseqrecords list of processed fragments with a single stranded 5’ cohesive end
source_offsets list of integers list of cumulative offsets aligning each source fragment
sticky_offsets list of integers list of cumulative offsets aligning each sticky fragment
overlap_sizes list of integers list of the length of each overlap joining the sequences

Source fragments are the double stranded blunt DNA fragments originally added as the dsrecs argument. The parwise overlaps found are added ar features to the source fragments.

Sticky fragments are source fragments that have been processed so that they can be ligated together by the homologous repair DNA machinery (single-strand annealing pathway). This involves trimming DNA fragments so that the fragments are flanked by a 5’ single stranded overhang.

Source_offsets is a list with the cumulative stagger between each source fragment. Source offsets are different from sticky offsets when the overlapping sequences are located in the interior of the fragments.

<-------->
<--------------------->                 source_offsets = [10,23]


cggcggcgggccTGCCTCtc                    \
gccgccgcccggACGGAGag                    | 
                                        |
          taTGCCTCaccattgcAAAAAAtt      |  source_fragments
          atACGGAGtggtaacgTTTTTTaa      |
                                        | 
                       aatAAAAAAcatcata | 
                       ttaTTTTTTgtagtat /

            <--6->        <--6->        overlap_sizes = [6,6]
         
cggcggcgggcc                             \
gccgccgcccggACGGAG                       | 
                                         |
            TGCCTCaccattgc               | sticky_fragments = [12,26]
                  tggtaacgTTTTTT         |
                                         | 
                          AAAAAAcatcata  | 
                                gtagtat  /
<---------->
<------------------------>               sticky_offsets

download Module

Provides a class for downloading sequences from genbank.

class pydna.download.Genbank(users_email, proxy=None, tool='pydna')[source]

Class to facilitate download from genbank.

users_email : string
Has to be a valid email address. You should always tell Genbanks who you are, so that they can contact you.
proxy : string, optional
String containing a proxy url: “proxy = “http://umiho.proxy.com:3128
tool : string, optional
Default is “pydna”. This is to tell Genbank which tool you are using.

import pydna gb=pydna.Genbank(“me@mail.se”, proxy = “http://proxy.com:3128”) gb.nucleotide(“L09137”) <- this method does the downloading from genbank SeqRecord(seq=Seq(‘TCGCGCGTTTCGGTGATGACGGTGAAAACCTCT.....

nucleotide(item)[source]

Download a genbank record using a Genbank object.

item is a string containing one genbank acession number [1] for a nucleotide file:

A12345 = 1 letter + 5 numerals
AB123456 = 2 letters + 6 numerals
[1]http://www.dsimb.inserm.fr/~fuchs/M2BI/AnalSeq/Annexes/Sequences/Accession_Numbers.htm
test()[source]

Test downloading the pUC19 plasmid sequence from genbank returns True if successful. Can be used to test the proxy and other network settings.

editor Module

This module provides a class for opening a sequence using an editor that accepts a path as a command line argument.

ApE - A plasmid Editor [1]_ is such an editor.

Examples

>>> import pydna
>>> ape = pydna.Editor("tclsh8.6 /home/bjorn/.ApE/apeextractor/ApE.vfs/lib/app-AppMain/AppMain.tcl")
>>> ape.open("aaa")
>>>
class pydna.editor.Editor(shell_command_for_editor, tmpdir=None)[source]
open(*args, **kwargs)[source]

Open a sequence for editing in an editor.

args : sequence or iterable of sequences
sequences to to be opened by the editor.

utils Module

This module provides miscellaneous functions.

pydna.utils.copy_features(source_sr, target_sr, limit=10)[source]

This function tries to copy all features in source_seq and copy them to target_seq. Source_sr and target_sr are objects with a features property, such as Dseqrecord or Biopython SeqRecord.

source_seq : SeqRecord or Dseqrecord
The sequence to copy features from
target_seq : SeqRecord or Dseqrecord
The sequence to copy features to
bool : True
This function acts on target_seq in place. No data is returned.
pydna.utils.eq(*args, **kwargs)[source]

Compares two or more DNA sequences for equality i.e. they represent the same DNA molecule. Comparisons are case insensitive.

args : iterable
iterable containing sequences args can be strings, Biopython Seq or SeqRecord, Dseqrecord or dsDNA objects.
circular : bool, optional
Consider all molecules circular or linear
linear : bool, optional
Consider all molecules circular or linear
eq : bool
Returns True or False

Compares two or more DNA sequences for equality i.e. if they represent the same DNA molecule.

Two linear sequences are considiered equal if either:

  • They have the same sequence (case insensitive)
  • One sequence is the reverse complement of the other (case insensitive)

Two circular sequences are considiered equal if:

  1. They have the same length.

AND

  1. One sequence or its reverse complement can be found in the concatenation of the other sequence with itself (they are circular permutations).

The topology for the comparison can be set using one of the keywords linear or circular to True or False.

If circular or linear is not set, it will be deduced from the topology of each sequence for sequences that have a linear or circular attribute (like Dseq and Dseqrecord).

>>> from pydna import eq, Dseqrecord
>>> eq("aaa","AAA")
True
>>> eq("aaa","AAA","TTT")
True
>>> eq("aaa","AAA","TTT","tTt")
True
>>> eq("aaa","AAA","TTT","tTt", linear=True)
True
>>> eq("Taaa","aTaa", linear = True)
False
>>> eq("Taaa","aTaa", circular = True)
True
>>> a=Dseqrecord("Taaa")
>>> b=Dseqrecord("aTaa")
>>> eq(a,b)
False
>>> eq(a,b,circular=True)
True
>>> a=a.looped()
>>> b=b.looped()
>>> eq(a,b)
True
>>> eq(a,b,circular=False)
False
>>> eq(a,b,linear=True)
False
>>> eq(a,b,linear=False)
True
>>> eq("ggatcc","GGATCC")
True
>>> eq("ggatcca","GGATCCa")
True
>>> eq("ggatcca","tGGATCC")
True
pydna.utils.shift_origin(seq, shift)[source]

Shift the origin of seq which is assumed to be a circular sequence.

seq : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord
sequence to be shifted.
new_seq : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord
sequence with a new origin.
>>> import pydna
>>> pydna.shift_origin("taaa",1)
'aaat'
>>> pydna.shift_origin("taaa",0)
'taaa'
>>> pydna.shift_origin("taaa",2)
'aata'
>>> pydna.shift_origin("gatc",2)
'tcga'
pydna.utils.sync(seq, ref)[source]

Synchronize two circular sequences.

seq : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord
sequence to be shifted.
ref : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord
The reference sequence.
sequence : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord
sequence with a new origin.

This function tries to rotate the circular sequence seq so that it has a maximum overlap with ref.

>>> import pydna
>>> pydna.sync("taaatc","aaataa")
'aaatct'
>>> pydna.sync("taaatc","aaataa")
'aaatct'
>>> pydna.sync("taaat","aaataa")
'aaatt'

Table Of Contents

This Page