Metadata-Version: 2.4
Name: rna-clique
Version: 0.3.0a1
Summary: Compute genetic distance matrices from RNA-seq data.
Author-email: Andrew Tapia <andrew.tapia@uky.edu>
License-Expression: MIT
Project-URL: Homepage, https://github.com/actapia/rna_clique
Project-URL: Bug Tracker, https://github.com/actapia/rna_clique
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tqdm
Requires-Dist: pandas
Requires-Dist: joblib>=1.3.2
Requires-Dist: networkx
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: scikit-bio
Requires-Dist: tables
Requires-Dist: biopython
Requires-Dist: more-itertools
Requires-Dist: dendropy
Requires-Dist: pyyaml
Requires-Dist: adjusttext
Requires-Dist: numpy
Requires-Dist: psutil
Requires-Dist: sympy
Requires-Dist: pysam
Requires-Dist: nice-colorsys
Requires-Dist: simple-blast>=0.7.9
Requires-Dist: python-sat[aiger,approxmc,cryptosat,pblib]
Requires-Dist: pyblast4_archive
Requires-Dist: name_conflict_resolver
Requires-Dist: multiset_key_dict
Dynamic: license-file

# RNA-clique

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14890599.svg)](https://doi.org/10.5281/zenodo.14890599)

This is the repository for RNA-clique, a tool for computing pairwise genetic
distances from RNA-seq data. The software accepts as input assembled
transcriptomes from two or more samples and produces as its output a matrix
containing pairwise distances ranging from 0 to 1.

## Installation

This software is written in Python. The software additionally requires NCBI
BLAST+ and several Python libraries. [Guides](#installation-guides) are provided
for installation on specific systems. Alternatively, for installing on other
systems, you can see the <!--{{doc_link("requirements.md", "requirements", False) | comment_surround}}{{empty("-->[requirements](https://actapia.github.io/rna_clique/dev/requirements)<!--")}}-->

### Installation guides

<!--{{doc_link("installation_guides/ubuntu.md", "Ubuntu", False) | bullet | comment_surround}}{{empty("-->
* [Ubuntu](https://actapia.github.io/rna_clique/dev/installation_guides/ubuntu)
<!--")}}-->
<!--{{doc_link("installation_guides/macos.md", "macOS", False) | bullet | comment_surround}}{{empty("-->
* [macOS](https://actapia.github.io/rna_clique/dev/installation_guides/macos)
<!--")}}-->

### Basic usage

To run RNA-clique on your assembled transcriptomes, first make sure that your
data are in a <!--{{doc_link("formats.md#transcriptomes", "format understood by
RNA-clique", False) | comment_surround}}{{empty("-->
[format understood by RNA-clique](https://actapia.github.io/rna_clique/dev/formats.md#transcriptomes).
<!--")}}-->

Then, run `rna-clique` with the directories containing your transcriptomes, an
output directory, and a setting for the number of top genes to select.

```bash
rna-clique -O my_rna_clique_out -n 50000 \
           path/to/transcriptome1_dir \
           path/to/transcriptome2_dir \
		   path/to/transcriptome3_dir ...
```

RNA-clique produces an output matrix at `my_rna_clique_out/matrix.h5`. To see it
in a human-readable format, use `export_matrix`.

```bash
python -m rna_clique.export_matrix -m my_rna_clique_out/matrix.h5 
```

More details about the usage of RNA-clique can be found in the <!--{{doc_link("usage.md", "Command-line usage guide", False) | comment_surround}}{{empty("-->[Command-line usage guide](https://actapia.github.io/rna_clique/dev/usage)<!--")}}-->


### Downstream analyses

The `export_matrix` program prints the calculated matrix to the standard
output, so you can use redirection or pipes to save the results to a file. You
could then use the matrix in any downstream application capable of loading
arbitrary matrices from files.

For example, if you output the matrix to a file named `distances`, you could
load the matrix in R using the following code:

```R
dis <- as.matrix(read.table("distances", sep=" "))
```

## Using RNA-clique in Python code

You can use RNA-clique directly from your Python code. For example,

```python
from rna_clique.rna_clique import rna_clique
from pathlib import Path

out_dir = Path("rna_clique_out")
out_dir.mkdir(exist_ok=True)
# Get the SampleSimilarity object and a dict mapping paths to their sample
# names.
sim, path_to_sample = rna_clique(
    [
        Path("path/to/transcriptome1_dir"),
        Path("path/to/transcriptome2_dir"),
        Path("path/to/transcriptome3_dir"),
    ],
	out_dir_1=out_dir / "od1",
	out_dir_2=out_dir / "od2",
	cache_dir=out_dir / "db_cache",
    output_graph=output_dir / "graph.pkl",
    output_matrix=output_dir / "matrix.h5",
	top_genes=50000
)
print(sim.get_dissimilarity_df())
```

For information on finer-grained control via RNA-clique's Python API, see the <!--{{doc_link("api/README.md", "API Guide", False) | comment_surround}}{{empty("-->[API guide](https://actapia.github.io/rna_clique/dev/api).<!--")}}-->

## License

All code is licensed under the MIT license, which may be found at LICENSE at the
root of this repository.

A machine-readable copyright file in Debian format may also be found at
copyright.

## Citation

If you use RNA-clique for your work, please cite ["RNA-clique: a method for
computing genetic distances from RNA-seq
data"](https://doi.org/10.1186/s12859-024-05811-9).

<!-- {% raw %}{{ -->
```tex
@article{tapia2024rna,
  title={{RNA-clique: a method for computing genetic distances from RNA-seq data}},
  author={Tapia, Andrew C and Jaromczyk, Jerzy W and Moore, Neil and Schardl, Christopher L},
  journal={BMC Bioinformatics},
  volume={25},
  year={2024},
  publisher={BioMed Central},
  keywords={pub}
}
```
<!-- }}{% endraw %} -->

## Additional documentation

<!--{{doc_link("usage.md", "Command-line usage guide", False) | bullet | comment_surround}}{{empty("-->
* [Command-line usage guide](https://actapia.github.io/rna_clique/dev/usage)
<!--")}}-->
<!--{{doc_link("config.md", "Configuration file guide", False) | bullet | comment_surround}}{{empty("-->
* [Configuration file guide](https://actapia.github.io/rna_clique/dev/config)
<!--")}}-->
<!--{{doc_link("formats.md", "File formats guide", False) | bullet | comment_surround}}{{empty("-->
* [File formats guide](https://actapia.github.io/rna_clique/dev/formats)
<!--")}}-->
<!--{{doc_link("api/README.md", "Python API Documentation", False) | bullet | comment_surround}}{{empty("-->
* [Python API Documentation](https://actapia.github.io/rna_clique/dev/api)
<!--")}}-->
<!--{{doc_link("tutorials/reads2tree/README.md", "Tutorial: From RNA-seq reads to a phylogenetic tree with RNA-clique", False) | bullet | comment_surround}}{{empty("-->
* [Tutorial: From RNA-seq reads to a phylogenetic tree with RNA-clique](https://actapia.github.io/rna_clique/dev/tutorials/reads2tree)
<!--")}}-->
<!--{{doc_link("tutorials/subsets/README.md", "Tutorial: Quickly computing subsets of existing analyses", False) | bullet | comment_surround}}{{empty("-->
* [Tutorial: Quickly computing subsets of existing analyses](https://actapia.github.io/rna_clique/dev/tutorials/subsets)
<!--")}}-->
<!--{{doc_link("tutorials/export_and_search/README.md", "Tutorial: Exporting and searching ideal components", False) | bullet | comment_surround}}{{empty("-->
* [Tutorial: Exporting and searching ideal components](https://actapia.github.io/rna_clique/dev/tutorials/export_and_search)
<!--")}}-->
<!--{{doc_link("tutorials/nonspades/README.md", "Tutorial: Using RNA-clique with non-SPAdes data", False) | bullet | comment_surround}}{{empty("-->
* [Tutorial: Using RNA-clique with non-SPAdes data](https://actapia.github.io/rna_clique/dev/tutorials/nonspades)
<!--")}}-->
