Metadata-Version: 2.4
Name: cpc2_standalone
Version: 1.0.9
Summary: CPC2: A fast and accurate coding potential calculator based on sequence intrinsic features. This package is maintained by Pranjal Pruthi, BioinformaticsOnLine organization.
Home-page: https://github.com/gao-lab/CPC2_standalone
Author: Kang Y. J., Yang D. C., Kong L., Hou M., Meng Y. Q., Wei L., Gao G.
Author-email: gaog@mail.cbi.pku.edu.cn
Maintainer: Pranjal Pruthi
Maintainer-email: mail@pranjal.work
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.9,<3.14
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: biopython
Requires-Dist: six
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: maintainer
Dynamic: maintainer-email
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

CPC2 standalone
====

* 2019-11-23 15:30, Yang Ding
   - Now CPC2 supports both Python 2 and Python 3 (thanks for help from HyperOdin)

1 Pre-requisite:
----
a. Biopython package: a local version could be downloaded from
http://biopython.org/wiki/Download

2 Install
----
a. Unpack the tarball:

	tom@linux$ gzip -dc CPC2-beta.tar.gz | tar xf -

b. Build third-part packages: 

	tom@linux$ cd CPC2-beta
	tom@linux$ export CPC_HOME="$PWD"
	tom@linux$ cd libs/libsvm
	tom@linux$ gzip -dc libsvm-3.18.tar.gz | tar xf -
	tom@linux$ cd libsvm-3.18
	tom@linux$ make clean && make

3 Run the predict
----
	tom@linux$ cd $CPC_HOME
	tom@linux$ bin/CPC2.py -i (input_seq) -o (result_in_table)
example: tom@linux$ bin/CPC2.py -i data/example.fa -o example_output

4 Output result
----
The result is in table format (plain text delimited by tab).

Default output:<br>
#ID	transcript_length	peptide_length	Fickett_score	pI	ORF_integrity	coding_probability	label

Set '--ORF' to output the start position of longest ORF:<br>
#ID	transcript_length	peptide_length	Fickett_score	pI	ORF_integrity	ORF_Start	coding_probability	label

Contact
----
>See the website for tutorial and more details. (http://cpc2.cbi.pku.edu.cn)<br>

>This is a beta version of CPC2, if have any questions please report to us.<br>

>Contact: cpc@mail.cbi.pku.edu.cn


## About CPC2

Here are some example commands:

*   **To run a basic test:** `cpc2 -i data/example.fa -o test_output`
*   **To check the reverse strand:** `cpc2 -i data/example.fa -o test_output -r`
*   **To output the longest ORF:** `cpc2 -i data/example.fa -o test_output --ORF`
*   **To get help:** `cpc2 --help`

Coding Potential Calculator distinguishes protein-coding from non-coding RNAs based on the sequence features of the input transcripts. CPC2 is an updated version of CPC1, designed to be faster and more accurate in discriminating coding and non-coding transcripts.

### Input Requirements

CPC2 accepts RNA transcript sequences in both FASTA format and GTF/GFF/BED format.

**FASTA format:**
*   Size: Less than 100,000 lines in input box (online) and no line limitation in batch mode. Maximum upload file size is 50 Mb.
*   Name: Sequence names must begin with ‘>’. Characters after a blank space in the ID will be discarded.
*   Sequence: Only characters found in DNA and RNA sequences are allowed.

**GTF/GFF/BED format:**
*   Supported formats: BED6, BED12, GTF, and GFF.
*   Size: Less than 50,000 lines. Maximum upload file size is 50 Mb.
*   Supported genomes for GTF/GFF/BED: Human (hg38, hg19), Chimpanzee (panTro4), Mouse (mm10), Rat (rn6), Zebrafish (danRer7), Xenopus (xendTro3), Fruitfly (dm6).
*   Note: Inputting in BED format might slow down processing.

### Features
*   **Speed and Accuracy:** CPC2 employs a novel discriminative model based on sequence intrinsic features, making it significantly faster than CPC1 and other popular tools, while also offering superior accuracy.
*   **Species-Neutral:** The model used in CPC2 is species-neutral, making it suitable for analyzing transcriptomes from a wide range of organisms, including non-model organisms.
*   **Output:** Results include sequence ID, coding/noncoding classification, coding probability, scores for putative peptide length, Fickett TESTCODE score, putative isoelectric point, and ORF integrity.

For more detailed information on the web server, input/output formats, and additional features like BLAST integration, please refer to the original CPC2 documentation and publication.

## Maintained for PyPI by:
Pranjal Pruthi
Project Scientist,
BioinformaticsOnLine organization
Email: mail@pranjal.work

## Original Publication:
Kang Y. J., Yang D. C., Kong L., Hou M., Meng Y. Q., Wei L., Gao G. 2017. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Research 45(Web Server issue): W12–W16.
