cogent3.core.sequence.ProteinSequence#

class ProteinSequence(seq='', name=None, info=None, check=True, preserve_case=False, gaps_allowed=True, wildcards_allowed=True, annotation_offset=0)#

Holds the standard Protein sequence.

Attributes
annotation_db
annotation_offset

The offset between annotation coordinates and sequence coordinates.

line_wrap

Methods

add_feature(*, biotype, name, spans[, ...])

add a feature to annotation_db

annotate_from_gff(f[, offset])

copies annotations from a gff file to self,

annotate_matches_to(pattern, biotype, name)

Adds an annotation at sequence positions matching pattern.

can_match(other)

Returns True if every pos in self could match same pos in other.

can_mismatch(other)

Returns True if any position in self could mismatch with other.

can_mispair(other)

Returns True if any position in self could mispair with other.

can_pair(other)

Returns True if self and other could pair.

complement()

Returns complement of self, using data from MolType.

copy([exclude_annotations])

returns a copy of self

copy_annotations(seq_db)

copy annotations into attached annotation db

count(item)

count() delegates to self._seq.

count_degenerate()

Counts the degenerate bases in the specified sequence.

count_gaps()

Counts the gaps in the specified sequence.

counts([motif_length, include_ambiguity, ...])

returns dict of counts of motifs

degap()

Deletes all gap characters from sequence.

diff(other)

Returns number of differences between self and other.

disambiguate([method])

Returns a non-degenerate sequence from a degenerate one.

distance(other[, function])

Returns distance between self and other using function(i,j).

first_degenerate()

Returns the index of first degenerate symbol in sequence, or None.

first_gap()

Returns the index of the first gap in the sequence, or None.

first_invalid()

Returns the index of first invalid symbol in sequence, or None.

first_non_strict()

Returns the index of first non-strict symbol in sequence, or None.

frac_diff(other)

Returns fraction of positions where self and other differ.

frac_diff_gaps(other)

Returns frac.

frac_diff_non_gaps(other)

Returns fraction of non-gap positions where self differs from other.

frac_same(other)

Returns fraction of positions where self and other are the same.

frac_same_gaps(other)

Returns fraction of positions where self and other share gap states.

frac_same_non_gaps(other)

Returns fraction of non-gap positions where self matches other.

frac_similar(other, similar_pairs)

Returns fraction of positions where self[i] is similar to other[i].

gap_indices()

Returns list of indices of all gaps in the sequence, or [].

gap_maps()

Returns dicts mapping between gapped and ungapped positions.

gap_vector()

Returns vector of True or False according to which pos are gaps.

get_drawable(*[, biotype, width, vertical])

make a figure from sequence features

get_drawables(*[, biotype])

returns a dict of drawables, keyed by type

get_features(*[, biotype, name, start, ...])

yields Feature instances

get_features_matching(**kwargs)

use .get_features()

get_in_motif_size([motif_length, log_warnings])

returns sequence as list of non-overlapping motifs

get_kmers(k[, strict])

return all overlapping k-mers

get_name()

Return the sequence name -- should just use name instead.

get_type()

Return the sequence type as moltype label.

gettype()

Return the sequence type.

is_annotated()

returns True if sequence has any annotations

is_degenerate()

Returns True if sequence contains degenerate characters.

is_gap([char])

Returns True if char is a gap.

is_gapped()

Returns True if sequence contains gaps.

is_strict()

Returns True if sequence contains only monomers.

is_valid()

Returns True if sequence contains no items absent from alphabet.

iter_kmers(k[, strict])

generates all overlapping k-mers.

make_feature(feature, *args)

return an Feature instance from feature data

matrix_distance(other, matrix)

Returns distance between self and other using a score matrix.

must_match(other)

Returns True if all positions in self must match positions in other.

must_pair(other)

Returns True if all positions in self must pair with other.

mw([method, delta])

Returns the molecular weight of (one strand of) the sequence.

parse_out_gaps()

returns Map corresponding to gap locations and ungapped Sequence

possibilities()

Counts number of possible sequences matching the sequence.

rc()

Returns reverse complement of self w/ data from MolType.

replace(oldchar, newchar)

return new instance with oldchar replaced by newchar

replace_annotation_db(value[, check])

public interface to assigning the annotation_db

resolveambiguities()

Returns a list of tuples of strings.

resolved_ambiguities()

Returns a list of tuples of strings.

shuffle()

returns a randomized copy of the Sequence object

sliding_windows(window, step[, start, end])

Generator function that yield new sequence objects of a given length at a given interval.

strip_bad()

Removes any symbols not in the alphabet.

strip_bad_and_gaps()

Removes any symbols not in the alphabet, and any gaps.

strip_degenerate()

Removes degenerate bases by stripping them out of the sequence.

to_fasta([make_seqlabel, block_size])

Return string of self in FASTA format, no trailing newline

to_html([wrap, limit, colors, font_size, ...])

returns html with embedded styles for sequence colouring

to_json()

returns a json formatted string

to_moltype(moltype)

returns copy of self with moltype seq

to_rich_dict([exclude_annotations])

returns {'name': name, 'seq': sequence, 'moltype': moltype.label}

translate(*args, **kwargs)

returns the result of call str.translate

with_masked_annotations(annot_types[, ...])

returns a sequence with annot_types regions replaced by mask_char if shadow is False, otherwise all other regions are masked.

with_termini_unknown()

Returns copy of sequence with terminal gaps remapped as missing.

gapped_by_map

gapped_by_map_motif_iter

gapped_by_map_segment_iter

strand_symmetry