cogent3.core.alphabet.Alphabet#

class Alphabet(motifset, gap='-', moltype=None)#

An ordered set of fixed-length strings, e.g. the 61 sense codons.

ambiguities (e.g. N for any base in DNA) are not considered part of the alphabet itself, although a sequence is valid on the alphabet even if it contains ambiguities that are known to the alphabet. A gap is considered a separate motif and is not part of the alphabet itself.

The typical use is for the Alphabet to hold nucleic acid bases, amino acids, or codons.

The moltype, if supplied, handles ambiguities, coercion of the sequence to the correct data type, and complementation (if appropriate).

Methods

adapt_motif_probs(motif_probs)

Prepare an array or dictionary of probabilities for use with this alphabet by checking size and order

count(value, /)

Return number of occurrences of value.

from_indices(data)

Returns sequence of elements from sequence of indices.

get_gap_motif()

Returns the motif that self is using as a gap.

get_matched_array(motifs[, dtype])

Returns an array in which rows are motifs, columns are items in self.

get_motif_len()

Returns the length of the items in self, or None if they differ.

get_subset(motif_subset[, excluded])

Returns a new Alphabet object containing a subset of motifs in self.

get_word_alphabet(word_length)

Returns a new Alphabet object with items as word_length strings.

includes_gap_motif()

Returns True if self includes the gap motif, False otherwise.

index(item)

Returns the index of a specified item.

is_valid(seq)

Returns True if seq contains only items in self.

resolve_ambiguity(ambig_motif)

Returns set of symbols corresponding to ambig_motif.

to_indices(data)

Returns sequence of indices from sequence of elements.

to_json()

returns result of json formatted string

with_gap_motif()

Returns an Alphabet object resembling self but including the gap.

AlphabetError

counts

to_rich_dict