Metadata-Version: 2.4
Name: proleTRact
Version: 0.2.0
Summary: A user-friendly platform for interactive exploration, visualization, and analysis of tandem repeat findings from TandemTwister outputs
Author-email: Lion Ward Al Raei <lionward.alraei@gmail.com>
License: BSD 3-Clause Non-Commercial License
Project-URL: Homepage, https://github.com/Lionward/ProleTRact
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: streamlit>=1.30
Requires-Dist: pandas>=2.0
Requires-Dist: numpy>=1.24
Requires-Dist: altair>=5.0
Requires-Dist: plotly>=5.0
Requires-Dist: pysam>=0.22
Requires-Dist: scikit-learn>=1.2
Requires-Dist: cython>=0.29.0
Dynamic: license-file
Dynamic: requires-python

<br />
<div align="center">
 <img width="300" alt="grafik" src="src/proletract/ProleTRact_logo.svg">
  </p>
</div>
<br />
<p>This repository contains a <strong>Tandem Repeat Visualization Tool</strong> that serves as the companion tool to <a href="https://github.com/Lionward/TandemTwister"><strong>TandemTwister</strong></a>. The tool processes Variant Call Format (VCF) files generated by TandemTwister and visualize tandem repeats in an intuitive, interactive format. Users can explore motifs, compare alleles to the reference sequence, and gain insights into the structure of tandem repeats, enhancing their ability to interpret genomic variation.</p>

<h2>Why ProleTRact?</h2>
<p>TRs are complex: alleles can differ by motif composition, length, and interrupted blocks. ProleTRact visulize TR regions with color-coded motifs, highlights interruptions, and provides intuitive navigation across regions and samples, enabling quick insight into potentially pathogenic expansions or atypical structures.</p>

<h2>Key Features</h2>
<ul>
  <li><strong>Individual and Cohort modes:</strong> Analyze a single VCF or an entire directory of VCFs.</li>
  <li><strong>Dynamic sequence visualization:</strong> Color-coded motifs, clear interruption highlighting, and side-by-side allele comparison.</li>
  <li><strong>Pathogenic TR reference overlay:</strong> Built-in <code>pathogenic_TRs.bed</code> provides context for known loci (disease, gene, thresholds).</li>
  <li><strong>Fast navigation:</strong> Move across TR records with Previous/Next controls or jump to a specific region.</li>
</ul>

<h2>Installation Options</h2>
<p>Pick the workflow that fits your environment:</p>

<h3>Option A &mdash; Install from PyPI (recommended)</h3>
<pre><code>pip install proleTRact
proleTRact  # launches the Streamlit app</code></pre>
<p>The launcher opens a browser locally. On headless machines set <code>STREAMLIT_SERVER_HEADLESS=true</code> before invoking <code>proleTRact</code>.</p>

<h3>Option B &mdash; Clone and run locally (with conda)</h3>
<pre><code>git clone git@github.com:Lionward/ProleTRact.git
cd ProleTRact
conda create -n proletract python=3.9
conda activate proletract
pip install -r requirements.txt
pip install -e .
streamlit run src/proletract/app.py
</code></pre>

<h2>Quickstart</h2>
<ol>
  <li>Launch the app with one of the commands above.</li>
  <li>Open the browser tab (Streamlit prints the URL if you are headless).</li>
  <li>Load an individual VCF or cohort folder from the sidebar and start exploring tandem repeats.</li>
</ol>

<h2>Usage</h2>
<h3>Individual mode 👤</h3>
<ol>
  <li>Select <strong>individual sample</strong> in the sidebar.</li>
  <li>Provide the absolute path to a bgzipped and tabix-indexed VCF (<code>.vcf.gz</code> with <code>.tbi</code>):
    <ul>
      <li>Enter the path in the sidebar input, then click <strong>Upload VCF File</strong>.</li>
      <li>The app will parse records and enable navigation across TR variants.</li>
    </ul>
  </li>
  <li>Use <strong>Previous</strong>/<strong>Next</strong> to step through records or jump to a region like <code>chr1:1000-2000</code>.</li>
  <li>Inspect motif blocks, interruptions, and per-allele differences.</li>
</ol>

<h3>Cohort mode 👥👥</h3>
<ol>
  <li>Select <strong>Cohort</strong> in the sidebar and choose <em>Reads-based VCF</em> or <em>Assembly VCF</em> view.</li>
  <li>Provide the absolute path to a directory containing TandemTwister VCF files:</li>
  <li>Click <strong>Load Cohort</strong> to scan the directory and enable cohort navigation.</li>
  <li>Browse records and compare across samples.</li>
  <li>Use <strong>Previous</strong>/<strong>Next</strong> to step through records or jump to a region like <code>chr1:1000-2000</code>.</li>
  <li>Inspect motif blocks, interruptions, and per-allele differences.</li>
</ol>

<h2>Input Requirements</h2>
<ul>
  <li><strong>VCF format:</strong> Standard VCF generated by TandemTwister.</li>
  <li><strong>Cohort directory:</strong> A folder with multiple <code>.vcf.gz</code> files generated by TandemTwister is required for cohort mode.</li>
</ul>


<h2>Demo / Examples</h2>
<p>Example screenshots and short walkthrough GIFs will be added here. For now, you can open <code>example.svg</code> for a preview:</p>
<img src="src/proletract/assets/example.svg" alt="Tandem Repeat Visualization Example" style="max-width: 100%; height: auto; border: 1px solid #ccc; padding: 10px;">
<ul>
  <li><em>Planned:</em> Individual-mode walkthrough </li>
  <li><em>Planned:</em> Cohort-mode walkthrough</li>
</ul>


<h2>Contributing</h2>
<p>Contributions are welcome! Please <a href="https://github.com/Lionward/ProleTRact/issues">open an issue</a> to discuss changes.</p>

<h2>License</h2>
<p>This project is licensed under the BSD 3-Clause Non-Commercial License — see <code>LICENSE</code> for details. Commercial use is prohibited. This software is intended for academic research, educational purposes, and personal/private use only. For commercial licensing inquiries, please contact the author.</p>

<h2>Citation</h2>
<p>If you use ProleTRact in your work, please cite this repository. A formal citation entry will be added once available.</p>
