Warning
Pyrex is no longer the accepted way to develop extensions. Use Cython instead.
Usage examples (see doc/) were added for the following:
assertSameObj - use in place of ‘assert a is b’ assertNotSameObj - use in place of ‘assert a is not b’ assertIsPermutation - checks if observed is a permutation of items assertIsProb - checks whether a value(s) are probabilities assertIsBetween - use in place of ‘assert a < obs < b’ assertLessThan - use in place of ‘assert obs < value’ assertGreaterThan - use in place of ‘assert obs > value’ assertSimiliarFreqs - compares frequency distributions using a G-test assertSimiliarMeans - compares samples using a t-test _set_suite_pvalue - set a suite wide pvalue
Note
both the similiarity assertions can have a pvalue specified in the testing module. This pvalue can be overwritten during alltests.py by calling TestCase._set_suite_pvalue(pvalue)
Note
All of these new assert methods can take lists as well. For instance: obs = [1,2,3,4] value = 5 self.assertLessThan(obs, value)
Alignment constructor now checks for iterators (e.g. results from parsers) and lists() them – this allows direct construction like Alignment(MinimalFastaParser(open(myfile.fasta))). Applies to both dense and sparse alignments, and SequenceCollections.
Parameterized LoadTree underscore stripping in node names, and turned it off by default.
Trivial edits of the code provided by Felix Schill for SQL-like table joining. Principally a unification of the different types of table joins (inner- and outer-join) between 2 tables, and porting of all testing code into test_table.rest. The method Table.joined provides the interface (see tests/test_table.rest for usage).
simply counts the number of rows satisfying some condition. Method has the same args as for Table.filtered.
Functions for obtaining the rate matrix for 2 or 3 sequences using the Goldman method. Support for RNA and DNA.
add_seqs_to_alignment align_two_alignments align_unaligned_seqs align_and_build_tree build_tree_from_alignment
align_unaligned_seqs bootstrap_tree_from_alignment build_tree_from_alignment align_and_build_tree
App controllers for Clearcut, ClustalW, Mafft
Added midpoint rooting
Accept FloatingPointError as well as ZeroDivisionError to accommodate numpy.
subsets: compare based on fraction of subsets of labels defined by clades that are the same in the two trees.
tip_to_tip: compare based on correlations of tip_to_tip distances.
Both of these are fairly badly behaved statistically, so should always be compared to a distribution of values from random (e.g. label-permuted) trees using Monte Carlo.
Added ability to exclude non-shared taxa from subsets tree cmp method.
Added Zongzhi’s combination and permutation implementations to transform.py.
Added some docs to UPGMA_cluster.
Added median in cogent.maths.stats.test Added because the numpy version does not support an axis parameter. This function now works like numpy functions (sum, mean, etc...) where you can specify axis. This function should be safe in place of numpy.median.
Many changes to the core objects, mainly for compatibility. Major changes in this update:
- ModelSequence now inherits from SequenceI and supports the various Sequence methods (e.g. nucleic acids can reverse-complement, etc.). Type checking is still performed using strings (e.g. for ambiguous characters, etc.) and could be improved, but everything seems to work. Bug # 1851959.
- ModelProteinSequence added. Bug # 1851961.
- DenseAlignment and ModelSequence can now handle the ‘?’ character, which is added to the Alphabet during install. Bug # 1851483.
- Fixed a severe bug in moltype constructors that mutated the dict of ambiguous states after construction of each of the standard moltypes (for example, preventing re-instantiation of a similar moltype after the initial install: bug # 1851482. This would have been very confusing for anyone trying to experiment with custom MolTypes.
- DenseAlignment now implements many methods of Alignment (some of which have actually been moved into SequenceCollection), e.g. getGappedSeq() as per bug # 1816573.
Added parameter to MageListFromString and MageGroupFromString. Can now handle ‘on’ as well as ‘off’.
SequenceCollection, Alignment, etc. now check for duplicate seq labels and raise exception or strip duplicates as desired. Added unit test to cover this case.
- SequenceCollection now also produces FASTA as default __str__ behavior like the other objects.
- DenseAlignment now iterates over its mapped items, not the indices for those items, by default. This allows API compatibility with Alignment but is slow: it may be worth optimizing this for cases such as detecting ambiguous chars, as I have already implemented for gaps.
Updated std in cogent.maths.stats.test
std now takes an axis parameter like numpy functions (sum, mean,etc...).
- also added in a docstring and tests.
Note
cogent.maths.stats.test import sqrt from numpy instead of math in order to allow std to work on arrays.
Warning
If you do modify the tree while using traverse(), you will get undesired results. If you need to modify the tree, use traverse_recursive() instead. This only applies to the tree topology (e.g. if you are adding or deleting nodes, or moving nodes around; doesn’t apply if you are changing branch lengths, etc.). The only two uses I found in Cogent where the tree is modified during iteration are in rna2d (some of the structure tree operations) and the prune() method. I have changed both to use traverse_recursive for now. However, there might be issues with other code. It might be worth figuring out how to make the iterative method do the right thing when the tree is modified – suggestions are welcome provided they do not impose substantial performance penalties.
Made compatible with Python 2.4
Changed dev status in setup call
Dropping comments indicating windows support
This bug was caused because the UPGMA algorithm picks the smallest distances between nodes at each step but should not ever pick something on the diagonal. To prevent a diagonal choice we set it to a large number, but sometimes, for very large matrices, the diagonal sometimes is chosen becuase the number decreases in value as the distances are averaged during node collapse. To prevent this error, the program now checks to make sure that the selected smallest_index is not on the diagonal. If it is, it reassigns the diagonal to the large number.
If attribute string did not contain double quotes, find() returned -1, so the last character of the string was inadvertently omitted.
This used to be masked by ncbi’s automatic conversion between protein and nucleotide ids, but apparently this conversion no longer operates in the tested cases.
Zongzhi noticed that assertFloatEqual would compare two against when a shape (4,0) array was compared against a shape (4,4) array. I added tests for assertFloatEqual, assertFloatEqualAbs, assertFloatEqualRel and assertEqual. The same bug was noticed in assertFloatEqualRel. They are now fixed. These fixes resulted in errors in maths.stats.test.std and correlation_matrix. The std function needed a work over, but the correlation_matrix was a fault in the test case itself.
failure when a record had a missing observation in the last field has been fixed. Line stripping of only line-feed characters is now done.
numpy eig() produces eigenvector array that is the transpose of Numeric eig(). Therefore, any code that does not take this into account will produce results that are TOTALLY INCORRECT when fed to downstream analyses. Coordinates from this module prior to this patch are incorrect and are not to be trusted.
Fixed a typo in dialign test
tree __repr__ now more robust to non-str Name entries
seqsim.rangenode traverse now compatible /w base class.
Fixed line color bug in PR2 bias plots.
Added method to dump raw coords from dendrogram.
Fixed called to eigenvector when no pyrex
Fixed bug in nonrecursive postorder traversal if not root