cogent3.core.sequence.ProteinSequence#
- class ProteinSequence(seq='', name=None, info=None, check=True, preserve_case=False, gaps_allowed=True, wildcards_allowed=True, annotation_offset=0)#
Holds the standard Protein sequence.
- Attributes
- annotation_db
annotation_offset
The offset between annotation coordinates and sequence coordinates.
- line_wrap
Methods
add_feature
(*, biotype, name, spans[, ...])add a feature to annotation_db
annotate_from_gff
(f[, offset])copies annotations from a gff file to self,
annotate_matches_to
(pattern, biotype, name)Adds an annotation at sequence positions matching pattern.
can_match
(other)Returns True if every pos in self could match same pos in other.
can_mismatch
(other)Returns True if any position in self could mismatch with other.
can_mispair
(other)Returns True if any position in self could mispair with other.
can_pair
(other)Returns True if self and other could pair.
complement
()Returns complement of self, using data from MolType.
copy
([exclude_annotations])returns a copy of self
copy_annotations
(seq_db)copy annotations into attached annotation db
count
(item)count() delegates to self._seq.
count_degenerate
()Counts the degenerate bases in the specified sequence.
count_gaps
()Counts the gaps in the specified sequence.
counts
([motif_length, include_ambiguity, ...])returns dict of counts of motifs
degap
()Deletes all gap characters from sequence.
diff
(other)Returns number of differences between self and other.
disambiguate
([method])Returns a non-degenerate sequence from a degenerate one.
distance
(other[, function])Returns distance between self and other using function(i,j).
first_degenerate
()Returns the index of first degenerate symbol in sequence, or None.
first_gap
()Returns the index of the first gap in the sequence, or None.
first_invalid
()Returns the index of first invalid symbol in sequence, or None.
first_non_strict
()Returns the index of first non-strict symbol in sequence, or None.
frac_diff
(other)Returns fraction of positions where self and other differ.
frac_diff_gaps
(other)Returns frac.
frac_diff_non_gaps
(other)Returns fraction of non-gap positions where self differs from other.
frac_same
(other)Returns fraction of positions where self and other are the same.
frac_same_gaps
(other)Returns fraction of positions where self and other share gap states.
frac_same_non_gaps
(other)Returns fraction of non-gap positions where self matches other.
frac_similar
(other, similar_pairs)Returns fraction of positions where self[i] is similar to other[i].
gap_indices
()Returns list of indices of all gaps in the sequence, or [].
gap_maps
()Returns dicts mapping between gapped and ungapped positions.
gap_vector
()Returns vector of True or False according to which pos are gaps.
get_drawable
(*[, biotype, width, vertical])make a figure from sequence features
get_drawables
(*[, biotype])returns a dict of drawables, keyed by type
get_features
(*[, biotype, name, start, ...])yields Feature instances
get_features_matching
(**kwargs)use .get_features()
get_in_motif_size
([motif_length, log_warnings])returns sequence as list of non-overlapping motifs
get_kmers
(k[, strict])return all overlapping k-mers
get_name
()Return the sequence name -- should just use name instead.
get_type
()Return the sequence type as moltype label.
gettype
()Return the sequence type.
is_annotated
()returns True if sequence has any annotations
is_degenerate
()Returns True if sequence contains degenerate characters.
is_gap
([char])Returns True if char is a gap.
is_gapped
()Returns True if sequence contains gaps.
is_strict
()Returns True if sequence contains only monomers.
is_valid
()Returns True if sequence contains no items absent from alphabet.
iter_kmers
(k[, strict])generates all overlapping k-mers.
make_feature
(feature, *args)return an Feature instance from feature data
matrix_distance
(other, matrix)Returns distance between self and other using a score matrix.
must_match
(other)Returns True if all positions in self must match positions in other.
must_pair
(other)Returns True if all positions in self must pair with other.
mw
([method, delta])Returns the molecular weight of (one strand of) the sequence.
parse_out_gaps
()returns Map corresponding to gap locations and ungapped Sequence
possibilities
()Counts number of possible sequences matching the sequence.
rc
()Returns reverse complement of self w/ data from MolType.
replace
(oldchar, newchar)return new instance with oldchar replaced by newchar
replace_annotation_db
(value[, check])public interface to assigning the annotation_db
resolveambiguities
()Returns a list of tuples of strings.
resolved_ambiguities
()Returns a list of tuples of strings.
shuffle
()returns a randomized copy of the Sequence object
sliding_windows
(window, step[, start, end])Generator function that yield new sequence objects of a given length at a given interval.
strip_bad
()Removes any symbols not in the alphabet.
strip_bad_and_gaps
()Removes any symbols not in the alphabet, and any gaps.
strip_degenerate
()Removes degenerate bases by stripping them out of the sequence.
to_fasta
([make_seqlabel, block_size])Return string of self in FASTA format, no trailing newline
to_html
([wrap, limit, colors, font_size, ...])returns html with embedded styles for sequence colouring
to_json
()returns a json formatted string
to_moltype
(moltype)returns copy of self with moltype seq
to_rich_dict
([exclude_annotations])returns {'name': name, 'seq': sequence, 'moltype': moltype.label}
translate
(*args, **kwargs)returns the result of call str.translate
with_masked_annotations
(annot_types[, ...])returns a sequence with annot_types regions replaced by mask_char if shadow is False, otherwise all other regions are masked.
with_termini_unknown
()Returns copy of sequence with terminal gaps remapped as missing.
gapped_by_map
gapped_by_map_motif_iter
gapped_by_map_segment_iter
strand_symmetry