cogent3.core.alignment.SequenceCollection#

class SequenceCollection(data, names=None, alphabet=None, moltype=None, name=None, info=None, conversion_f=None, is_array=False, force_same_data=False, remove_duplicate_names=False, label_to_name=None, suppress_named_seqs=False)#

Container for unaligned sequences

Attributes
annotation_db
num_seqs

Returns the number of sequences in the alignment.

seqs

Methods

add_feature(*, seqid, biotype, name, spans)

add feature on named sequence

add_seqs(other[, before_name, after_name])

Returns new object of class self with sequences from other added.

annotate_from_gff(f[, seq_ids])

copies annotations from a gff file to a sequence in self

apply_pssm([pssm, path, background, ...])

scores sequences using the specified pssm

copy()

Returns deep copy of self.

copy_annotations(seq_db)

copy annotations into attached annotation db

counts([motif_length, include_ambiguity, ...])

counts of motifs

counts_per_seq([motif_length, ...])

counts of motifs per sequence

deepcopy([sliced])

returns deep copy of self.

degap(**kwargs)

Returns copy in which sequences have no gaps.

distance_matrix([calc])

Estimated pairwise distance between sequences

dotplot([name1, name2, window, threshold, ...])

make a dotplot between specified sequences.

entropy_per_seq([motif_length, ...])

Returns the Shannon entropy per sequence.

get_ambiguous_positions()

Returns dict of seq:{position:char} for ambiguous chars.

get_features(*[, seqid, biotype, name, ...])

yields Feature instances

get_identical_sets([mask_degen])

returns sets of names for sequences that are identical

get_lengths([include_ambiguity, allow_gap])

returns {name: seq length, ...}

get_motif_probs([alphabet, ...])

Return a dictionary of motif probs, calculated as the averaged frequency across sequences.

get_seq(seqname)

Return a sequence object for the specified seqname.

get_seq_indices(f[, negate])

Returns list of keys of seqs where f(row) is True.

get_similar(target[, min_similarity, ...])

Returns new Alignment containing sequences similar to target.

get_translation([gc, incomplete_ok, ...])

translate from nucleic acid to protein

has_terminal_stop([gc, strict, allow_partial])

Returns True if any sequence has a terminal stop codon.

has_terminal_stops(**kwargs)

deprecated

is_ragged()

Returns True if alignment has sequences of different lengths.

iter_selected([seq_order, pos_order])

Iterates over elements in the alignment.

iter_seqs([seq_order])

Iterates over values (sequences) in the alignment, in order.

make_feature(*, feature)

create a feature on named sequence, or on the alignment itself

omit_gap_runs([allowed_run])

Returns new alignment where all seqs have runs of gaps <=allowed_run.

omit_gap_seqs([allowed_gap_frac])

Returns new alignment with seqs that have <= allowed_gap_frac.

pad_seqs([pad_length])

Returns copy in which sequences are padded to same length.

probs_per_seq([motif_length, ...])

return MotifFreqsArray per sequence

rc()

Returns the reverse complement alignment

rename_seqs(renamer)

returns new instance with sequences renamed

reverse_complement()

Returns the reverse complement alignment.

set_repr_policy([num_seqs, num_pos, ...])

specify policy for repr(self)

strand_symmetry([motif_length])

returns dict of strand symmetry test results per seq

take_seqs(seqs[, negate])

Returns new Alignment containing only specified seqs.

take_seqs_if(f[, negate])

Returns new Alignment containing seqs where f(row) is True.

to_dict()

Returns the alignment as dict of names -> strings.

to_dna()

returns copy of self as an alignment of DNA moltype seqs

to_fasta()

Return alignment in Fasta format

to_json()

returns json formatted string

to_moltype(moltype)

returns copy of self with moltype seqs

to_nexus(seq_type[, wrap])

Return alignment in NEXUS format and mapping to sequence ids

to_phylip()

Return alignment in PHYLIP format and mapping to sequence ids

to_protein()

returns copy of self as an alignment of PROTEIN moltype seqs

to_rich_dict()

returns detailed content including info and moltype attributes

to_rna()

returns copy of self as an alignment of RNA moltype seqs

trim_stop_codons([gc, strict, allow_partial])

Removes any terminal stop codons from the sequences

with_modified_termini()

Changes the termini to include termini char instead of gapmotif.

write([filename, format])

Write the alignment to a file, preserving order of sequences.