cogent3.core.alignment.SequenceCollection#
- class SequenceCollection(*args, **kwargs)#
Container for unaligned sequences
- Attributes:
- annotation_db
- named_seqs
num_seqs
Returns the number of sequences in the alignment.
- seqs
Methods
add_feature
(*, seqid, biotype, name, spans)add feature on named sequence
add_seqs
(other[, before_name, after_name])Returns new object of class self with sequences from other added.
annotate_from_gff
(f[, seq_ids])copies annotations from a gff file to a sequence in self
apply_pssm
([pssm, path, background, ...])scores sequences using the specified pssm
copy
()Returns deep copy of self.
copy_annotations
(seq_db)copy annotations into attached annotation db
counts
([motif_length, include_ambiguity, ...])counts of motifs
counts_per_seq
([motif_length, ...])counts of motifs per sequence
deepcopy
([sliced])returns deep copy of self.
degap
(**kwargs)Returns copy in which sequences have no gaps.
distance_matrix
([calc])Estimated pairwise distance between sequences
dotplot
([name1, name2, window, threshold, ...])make a dotplot between specified sequences.
entropy_per_seq
([motif_length, ...])Returns the Shannon entropy per sequence.
get_ambiguous_positions
()Returns dict of seq:{position:char} for ambiguous chars.
get_features
(*[, seqid, biotype, name, ...])yields Feature instances
get_identical_sets
([mask_degen])returns sets of names for sequences that are identical
get_lengths
([include_ambiguity, allow_gap])returns {name: seq length, ...}
get_motif_probs
([alphabet, ...])Return a dictionary of motif probs, calculated as the averaged frequency across sequences.
get_seq
(seqname)Return a sequence object for the specified seqname.
get_seq_indices
(f[, negate])Returns list of keys of seqs where f(row) is True.
get_similar
(target[, min_similarity, ...])Returns new Alignment containing sequences similar to target.
get_translation
([gc, incomplete_ok, ...])translate from nucleic acid to protein
has_terminal_stop
([gc, strict])Returns True if any sequence has a terminal stop codon.
is_ragged
()Returns True if alignment has sequences of different lengths.
iter_selected
([seq_order, pos_order])Iterates over elements in the alignment.
iter_seqs
([seq_order])Iterates over values (sequences) in the alignment, in order.
make_feature
(*, feature)create a feature on named sequence, or on the alignment itself
omit_gap_runs
([allowed_run])Returns new alignment where all seqs have runs of gaps <=allowed_run.
omit_gap_seqs
([allowed_gap_frac])Returns new alignment with seqs that have <= allowed_gap_frac.
pad_seqs
([pad_length])Returns copy in which sequences are padded to same length.
probs_per_seq
([motif_length, ...])return MotifFreqsArray per sequence
rc
()Returns the reverse complement alignment
rename_seqs
(renamer)returns new instance with sequences renamed
reverse_complement
()Returns the reverse complement alignment.
set_repr_policy
([num_seqs, num_pos, ...])specify policy for repr(self)
strand_symmetry
([motif_length])returns dict of strand symmetry test results per seq
take_seqs
(seqs[, negate])Returns new Alignment containing only specified seqs.
take_seqs_if
(f[, negate])Returns new Alignment containing seqs where f(row) is True.
to_dict
()Returns the alignment as dict of names -> strings.
to_dna
()returns copy of self as an alignment of DNA moltype seqs
to_fasta
()Return alignment in Fasta format
to_json
()returns json formatted string
to_moltype
(moltype)returns copy of self with moltype seqs
to_nexus
(seq_type[, wrap])Return alignment in NEXUS format and mapping to sequence ids
to_phylip
()Return alignment in PHYLIP format and mapping to sequence ids
to_protein
()returns copy of self as an alignment of PROTEIN moltype seqs
to_rich_dict
()returns detailed content including info and moltype attributes
to_rna
()returns copy of self as an alignment of RNA moltype seqs
trim_stop_codons
([gc, strict])Removes any terminal stop codons from the sequences
with_modified_termini
()Changes the termini to include termini char instead of gapmotif.
write
([filename, format])Write the alignment to a file, preserving order of sequences.