Metadata-Version: 2.4
Name: metalncrna
Version: 2.0.1
Summary: A portable and modular meta-predictor for identifying Long Non-coding RNAs (lncRNAs).
Author-email: Fabiano Bezerra Menegidio <fabianomenegidio@umc.br>
License: MIT
Keywords: lncRNA,bioinformatics,genomics,ensemble
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click
Requires-Dist: pandas
Requires-Dist: biopython
Requires-Dist: plotly
Requires-Dist: pyyaml
Requires-Dist: rich
Requires-Dist: matplotlib
Provides-Extra: agent
Requires-Dist: ollama; extra == "agent"
Dynamic: license-file

# metaLncRNA v2.0.0 🧬🤖

<p align="center">
  <img src="https://raw.githubusercontent.com/LaBiOmicS/metaLncRNA/main/logo.png" alt="metaLncRNA Logo" width="70%">
</p>

<!-- Institutional Badges -->
[![DOI](https://zenodo.org/badge/1208858176.svg)](https://doi.org/10.5281/zenodo.19547230)
[![University: UMC](https://img.shields.io/badge/University-UMC-0D47A1.svg)](https://www.umc.br/)
[![Laboratory: LaBiOmicS](https://img.shields.io/badge/Laboratory-LaBiOmicS-7B1FA2.svg)](https://github.com/LaBiOmicS)
[![Bioinformatics](https://img.shields.io/badge/Bioinformatics-lncRNA-green.svg)](https://github.com/LaBiOmicS/metaLncRNA)

<!-- Open Science Badges -->
[![PyPI Version](https://img.shields.io/pypi/v/metalncrna.svg)](https://pypi.org/project/metalncrna/)
[![Open Source](https://img.shields.io/badge/Open-Source-brightgreen.svg)](https://github.com/LaBiOmicS/metaLncRNA)
[![Open Science](https://img.shields.io/badge/Open-Science-blue.svg)](https://github.com/LaBiOmicS/metaLncRNA)
[![Open Data](https://img.shields.io/badge/Open-Data-brightgreen.svg)](https://github.com/LaBiOmicS/metaLncRNA)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![JOSS Status](https://img.shields.io/badge/JOSS-Pre--submission-brightgreen.svg)](https://joss.theoj.org/)
[![CI Status](https://github.com/LaBiOmicS/metaLncRNA/actions/workflows/ci.yml/badge.svg)](https://github.com/LaBiOmicS/metaLncRNA/actions/workflows/ci.yml)

<!-- Tech & Method Badges -->
[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/)
[![Powered by Ollama](https://img.shields.io/badge/AI-Powered_by_Ollama-orange.svg)](https://ollama.com)
[![Ensemble Learning](https://img.shields.io/badge/Method-Ensemble_Learning-blueviolet.svg)](https://github.com/LaBiOmicS/metaLncRNA)

---

`metaLncRNA` is a modular, high-performance Python framework designed to identify Long Non-coding RNAs (lncRNAs) by orchestrating an ensemble of seven diverse computational tools. It resolves the "reproducibility gap" by automating environment management and providing a robust consensus prediction through weighted soft-voting.

---

<p align="center">
  <img src="https://raw.githubusercontent.com/LaBiOmicS/metaLncRNA/main/infografico.png" alt="metaLncRNA Infographic" width="100%">
</p>

---

## 📂 Repository Structure

```text
.
├── conda/                   # Bioconda recipe and metadata
├── deploy/                  # Containerization (Dockerfile, Singularity.def)
├── docs/                    # Technical documentation and user guides
├── examples/                # Quick-start samples (FASTA, config templates)
├── galaxy/                  # Galaxy Tool wrapper and test data
├── INPI_Registration/       # Legal software registration assets
├── paper/                   # JOSS publication manuscript and bibliography
├── scripts/                 # Bash scripts for HPC/Batch processing
├── src/
│   └── metalncrna/          # Main Python Package
│       ├── cli.py           # Command-line interface entry point
│       ├── adapters/        # Wrappers for 7 lncRNA predictors
│       ├── engine/          # Core logic (Consensus, Dispatcher, Trainer)
│       ├── utils/           # AI Agent, Env management, Reports, FASTA handling
│       ├── data/            # Built-in weights and default configurations
│       └── third_party/     # Bundled legacy tools (CNCI, CPPred, LGC)
├── tests/                   # Comprehensive Unit and Integration tests
├── pyproject.toml           # Build system and dependency definitions
└── pixi.toml                # Environment management configuration
```

### 🧩 Core Components Detail

- **`src/metalncrna/adapters/`**: Orchestrates external tools like RNAsamba, CPAT, CPC2, etc., providing a unified interface for prediction.
- **`src/metalncrna/engine/`**: 
    - `consensus.py`: Implements the weighted soft-voting algorithm.
    - `dispatcher.py`: Manages parallel execution of the ensemble.
- **`src/metalncrna/utils/agent.py`**: Integrates with local LLMs (Ollama) for automated biological interpretation of results.
- **`galaxy/`**: Allows `metaLncRNA` to be integrated into Galaxy instances, supporting reproducible web-based workflows.

---

## 🔧 Recent Fixes (v2.0.0)
- **CNCI Stability:** Fixed a critical hang in the CNCI legacy tool caused by non-canonical nucleotides (e.g., K, V, M) and a multiprocessing deadlock in the original Python 2.7 implementation.
- **Improved Filtering:** Implemented rigorous FASTA validation in the CNCI adapter to exclude sequences with ambiguous characters, preventing `KeyError` crashes.
- **Resource Optimization:** Optimized tool dispatching by limiting CNCI threads to 4, reducing I/O overhead and improving performance for small-to-medium files.
- **Debug Resiliency:** The dispatcher now preserves intermediate files automatically if a tool failure occurs, facilitating troubleshooting.

## 🔧 Previous Fixes (v1.2.1)
- **Consensus Logic:** Updated `consensus_support` to reflect the number of tools that agree with the final consensus label, providing better interpretability.
- **CPC2 Integration:** Fixed a critical parsing error where coding probability and label columns were mismatched (v1.1.8).
- **Cleanup:** Removed unimplemented/experimental adapters to ensure stability.

---

## ⚙️ Configuration

`metaLncRNA` follows a robust configuration loading order:
1. **Internal Defaults:** Built-in weights and paths in `src/metalncrna/data/default_config.yaml`.
2. **Local Config:** `metaLncRNA_config.yaml` in your current working directory.
3. **User Home:** `~/.metalncrna/config.yaml`.
4. **Explicit Path:** Provided via the `-c` or `--config` flag.

---

## 🚀 Key Features

- **Ensemble Prediction:** Combines 7 tools (RNAsamba, CPAT, CPC2, PLEK, CNCI, CPPred, LGC).
- **Interactive AI Agent:** Integrated local LLM assistant (**Llama-3.2** or **OpenBioLLM**) to interpret results and explain classification decisions.
- **Reproducibility First:** Built-in environment isolation via **Mamba** and **Pixi**.
- **Standardized Reports:** Comprehensive TSV reports with tool congruence metrics.
- **Publication Ready:** Formatted according to JOSS standards for scientific software.
