Metadata-Version: 2.1
Name: biofile-kit
Version: 0.1.0
Summary: Bioinformatics File Operations Toolkit.
Home-page: https://github.com/wenlinXu-njfu/BioFileKit
License: MIT
Author: Wenlin Xu
Author-email: wenlinxu.njfu@outlook.com
Requires-Python: >=3.8
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: pybioinformatic (>=1.2.5)
Description-Content-Type: text/markdown

# Introduction
**BioFileKit: A Python command-line toolkit dedicated to simplifying the reading, writing, conversion, 
parsing and basic operations of various biological data file formats.**

# Install
```
pip install biofile-kit --upgrade
```

# Change log
## v0.1.0
- New
  - Fasta file tools ([fasta_tools](biofile_kit/bin/fasta_tools.py)):
    - [change hard masked to soft masked](biofile_kit/fasta_utils/hardmasked_to_softmasked.py)
    - [telomere finder](biofile_kit/fasta_utils/telomere_finder.py)
  - Gff file tools ([gff_tools](biofile_kit/bin/gff_tools.py)):
    - [Statistical analysis of GFF file](biofile_kit/gff_utils/stats.py)
  - Genotype file tools ([gt_kit](biofile_kit/bin/gt_kit.py)):
    - [genotype consistency analysis](biofile_kit/gt_utils/genotype_consistency_analysis.py)
    - [genotype file merging](biofile_kit/gt_utils/merge.py)
    - [statistical analysis of SNP loci](biofile_kit/gt_utils/stat_gt.py)
  - VCF file tools ([vcf_tools](biofile_kit/bin/vcf_tools.py)):
    - [convert the file format from VCF to Genotype (GT)](biofile_kit/vcf_utils/vcf2gt.py)
- Modified
  - Null
- Deleted
  - Null

# Usage example
## Generate random nucleotide sequences
```shell
fasta_tools random_nucl -n seq1,seq2 -l 1000,1200
```
```
>seq1 length=1000
CGCCAGGCCTGCCCTGCGACGGAGGTTCCCCGTATGACTGCCCTATATCATTCCTGCTAAACTCAATCCACAAGATCAATTCACTCCGGGGAACAACTGCCACTAGAAACCGTAGGTTACCATCAATAGTTCCCCACTTGGAGGAAGAAGTCTTTGAAGCAGGTTGTCATCCAGCATTCTTTCTAAACGTCATTGGACATAGGGGTAAGCTCATATCCTCTCCCAACCATTCAGAAGTCCATGACCATGTCCGGTGCAAATTTGAAAGTCATGATGGTGAGGGAGCAAGAGAGCGCAGATCACGGATAAGTATTAAAAAGTGCTGTCGAGGCCGCAGTGGAAGTGACTAATTGGCTGATGCACGGACCTCCAGTGTACAGCTCATGTTTCAGGTGCGTCGGACTGTCAGTGACTCAATTTTCTGGGCCCAACTCCGCGTTCGGTGGATTAGTAACTATAGTGGTTGCATGAGGTACTGAGATTGAGCCGTGAAAAGCATTCAAAGTGCGGTTCCTCAACCTATTATTATTAAGACATAAGTTTGCTAGCGCTTTGTTGCAATCGTGTCGTGGAATGCGATTGATGCTTAGCAGTTTCCGGGAAGTACGGACTCATGCCGTTATGTGCGCCAACAAACAGCGCGTGTTTCATTTCGCGCCGGTCGCCTGGCGCGTGTTATGGGATCGCACTTCACCGTGCTGATATCGCTGAGGCGAGGGTTCCTCGAGATATTGGCTTGGCTCGCCAGGCAGTAGTCGTGGTCAGCCCGACTTGGCACGCTAAAGACGAGCCCACGTGCATTCGGTCGGAATCAGTTAGACGTCGAACGATTCGATCCAGCGTGAGGCCTATCCTTTGCCCATTTAACTCCGTATTCACGGTCTCCTTGATACATAGTGTACTTAGTGTTACCAGCGAACTCCGACGCGGACAGTGTCCTCGGAGTATTACCTCCAAAGAAATTCTCGGGCCGAACAGCGTAGTCTATACCGCCTGGGTG
>seq2 length=1200
ATAGGTGTAGTGTGTCTTCATCTTGATGTAAGTTCGTTCACCCAGATCTGCTAAAACGCATGGCATTTTTTTCGCATACGGTCCACTGGCACTATATGATTCCCAGTACTTCGCAGATTTGGGGGGGTAAGAGTCCGCGGAAGCGTTGTTCTGACGCGTACGCATGTTCGGTATTTTTTACGGGTGAGTTGCATCGGTTGTGTATTGGTCCATGTTAAGACGGTTATCGGGCAGGCTTCTCAATGCGGTGAGTCGGGAAGACACTAGCCAGCGAAATTATGTGATCGCTGGAATAGGATCGATGTAGCAACGACACTTTCCTGGCCTACAGACGGACTTGGACCGGATCAATCGTCTTATATAATAATACACGTCGCAGAACGGTCTGTGTATAGGACCGGTAGAATGAGTAGTTCATACTCCGGCCCGCAGGTACCCCTGTACGCATGAAAGTCCAAGCTCTCGCTGAACCGACACCTCTAGCCGAGGTACGTATGCATGACCTGGTTGTTCTCTTCGGGTCACGACAGTTGCCTATTTACGCTCGGATACCAGGAAACTTTGCCGGGAGTTCGCCCCCAGTAGTTCCCGGGTTGGGGTCGGGGTGTTCTGCCGATTACCGGATGTATCTCACCTGAGATTCAGCATCGGTGCGAACATCGTGAATCCTAAAGGTTGAACAAAGGAAGGCCTCCATGCGTTGGAAAGTCCTCGAAGTGGAGAAGTCTATCGTAGATCAACCGATAGGCAATGAAAAGAAAAAGCGCAACAGACGCCACGCTTCTAGATCGCAGTTGGCCTTTTAATGGCGAATCCATTTACCGAGCGAAGAAAAAGCCTGGCTAGCTTGTTTAAAACTGGTAACACTGAATCTCCGAAAGAGTAGCTATAGGCTCCCAGCACAGCCTGCGGCTGGCGCCAACGCCTAACGAAAATGCCAATCCACTTAGTTGTGTTAACTGTCTCCCCACTATATGCGGCTTACCAGGGAGTGTAATTTCTGGCGATGACCAGCGTTTCCTTTGGGTTCCGTCGAATTCCTTAGATCTAGGACAGCAGTTCGAATTACTTGGCGTGGTCGCATCAGGACTTCGCGTAGTGGCTATCCAGATCATAGACTGAGTCACGTATTTGACGCCAGACCTAAGACCCCACGATGGTTTCTAGTCGTAACTTGAGTGAGCTAGCTCGCCTCGTGTC
```

## ORF prediction
```shell
fasta_tools random_nucl -n seq1,seq2 -l 1000,1200 | fasta_tools ORF_finder -c -
```
```
>seq1 length=42 ORF_prediction
MRLMLSSFREVRTHAVMCANKQRVFHFAPVAWRVLWDRTSPC*
>seq2 length=87 ORF_prediction
MIWIATTRSPDATTPSNSNCCPRSKEFDGTQRKRWSSPEITLPGKPHIVGRQLTQLSGLAFSLGVGASRRLCWEPIATLSEIQCYQF*
```

