Changelog
This document outlines the changes made to the project with each release.
Version 1.1.1 (2024-10-25)
Features
Updated tree parsing functionality and added it to the
TreeParser
class in theanalysis/tree_parser.py
module to conform to refactor, and added new functionality to parse, modify, draw, and save Newick and NEXUS tree files.siterates
andqmatrix
files now dynamically determine if they are in IQ-TREE format or if they are just in a simple tab-delimited or comma-delimited format.site_rates
andqmat
are now read in as pandas DataFrames with less complex logic.Added unit test for tree parsing.
Added integration test for tree parsing.
Added documentation for tree parsing.
Bug Fixes
Fixed bug where the
PhylipReader
andStructureReader
classes did not have theverbose
anddebug
attributes.
Changes
q
property is now calledqmat
for clarity and easier searching in files.Removed redundant
siterates_iqtree
andqmatrix_iqtree
arguments attributes from theGenotypeData
,VCFReader
,PhylipReader
,StructureReader
, andTreeParser
classes.Added error handling for tree parsing.
Added error handling for
siterates
andqmatrix
files.
Version 1.1.0 (2024-10-08)
Features
- Full refactor of the codebase to improve user-friendliness, maintainability and readability.
Method chaining: All functions now return the object itself, allowing for method chaining and custom filtering orders with
NRemover2
.Most objects now just take a
GenotypeData
object as input, making the code more modular and easier to maintain.Improved documentation and docstrings.
Improved error handling.
Improved logging. All logging is now done with the Python logging module via the custom
LoggerManager
class.Improved testing.
- Improved performance.
Reduced memory usage.
Reduced disk usage.
Reduced CPU usage.
Reduced execution time, particularly for reading, loading, filtering, and processing large VCF files.
Improved plotting.
Improved data handling.
Improved file handling. All filenames now use pathlib.Path objects.
Code modularity: Many functions are now in separate modules for better organization.
Full unit tests for all functions.
Full integration tests for all functions.
Full documentation for all functions.
Version 1.0.5 (2023-09-16)
Features
Added
thin
andrandom_subset
options tonremover()
function.thin
removes loci withinthin
bases of the nearest locus.random_subset
randomly subsets the loci using an integer or proportion.
Changes
Changed
unlinked
tounlinked_only
option for clarity
Version 1.0.4 (2023-09-10)
Features
Added functionality to filter out linked SNPs using CHROM and POS fields from VCF file.
Performance
Made the Sankey plot function more modular and dynamic for easier maintainability.
Bug Fixes
Fix spacing between printed STDOUT.
Version 1.0.3.3 (2023-09-01)
Bug Fixes
Fixed bug where CHROM VCF field had strings cut off at 10 characters.
Version 1.0.3.2 (2023-08-28)
Bug Fixes
Fixed copy method for pysam.VariantHeader objects.
Version 1.0.3 (2023-08-27)
Features
Performance improvements for VCF files.
Load and write VCF file in chunks of loci to improve memory consumption.
New output directory structure for better organization.
VCF file attributes are now written to an HDF5 file instead of all being loaded into memory.
Increased usage of numpy to improve VCF IO.
Added AF INFO field when converting PHYLIP or STRUCTURE files to VCF format.
VCF file reading uses pysam instead of cyvcf2 now.
Bug Fixes
Fixed bug with search_threshold plots where the x-axis values would be sorted as strings instead of integers.
Fixed bugs where sampleIDs were out of order for VCF files.
Ensured correct order for all objects.
Fixed bugs when subsetting with popmaps files.
Fixed to documentation.
Version 1.0.2 (2023-08-13)
Bug Fixes
Fix for VCF FORMAT field being in wrong order.
Version 1.0.1 (2023-08-09)
Bug Fixes
Band-aid fix for incorrect order of sampleIDs in VCF files.
Initial Release
Reads and writes PHYLIP, STRUCTURE, and VCF files.
Loads data into GenotypeData object.
- Filters DNA sequence alignments using NRemover2.
Filters by minor allele frequence, monomorphic, and non-billelic sites
Filters with global (whole columns) and per-population, per-locus missing data thresholds.
Makes informative plots.