Nanopore Direct-RNA sequencing has revolutionized transcriptomics but is challenged by artificial chimeric reads that compromise data integrity. We present DeepChopper, a novel large language model tailored for biological sequences, which accurately identifies and removes artificial sequences in NanoPore Direct-RNA sequencing data without relying on alignment information. DeepChopper's hybrid architecture, combining HyenaDNA for long-range dependency modeling with quality-aware processing, achieves both broad context understanding and single nucleotide resolution. Across multiple cell lines and sequencing platforms, DeepChopper reduced chimeric reads by 62-84% and improved supporting rates from 8-19% to 43-55% compared to existing methods. In particular, in gene fusion detection, DeepChopper reduced false positives by 89% while increasing the proportion of supported fusions from 2% to 17%. By improving data quality, DeepChopper significantly improves the reliability of downstream analyses, particularly in cancer genomics and transcriptomics. This work demonstrates the powerful potential of large language models in analyzing complex biological data, paving the way for advancements in genomics and biotechnology.
BibTex Code Here