Skip to content

msmu

Python toolkit for LC-MS/MS Proteomics analysis based on MuData


Overview

msmu is a Python package for scalable, modular, and reproducible LC-MS/MS proteomics data analysis.
It supports PSM, peptide, and protein-level processing, integrates MuData (AnnData) structure, and enables stepwise normalization, batch correction, and statistical testing for biomarker discovery and systems biology.


Key Features

  • Flexible data ingestion from DIA-NN, Sage (now supporting), and other Database search tools (future)
  • MuData/AnnData-compatible object structure for multi-level omics
  • Built-in QC: precursor purity, peptide length, charge, missed cleavage
  • Protein inference: infer protein with ... rule
  • Normalization options: log2, quantile, median centering, GIS/IRS
  • Statistical analysis: permutation-based DE test and FDR
  • PTM support and stoichiometry adjustment with global dataset
  • Visualization: PCA, UMAP, volcano plots, heatmaps, QC metrics

File Structure and Input Format

Accepted inputs

  • Sage: folder with PSM tables
  • DIA-NN: output folder
  • MaxQuant
  • FragPipe (MSFragger)
  • DelPy

Output

  • Integrated multi-level MuData object (.h5mu)
  • Summary plots and statistics
  • Differentially expressed proteins/sites with FDR

Roadmap

UPCOMING
- ✅ Support DIA-NN and Sage formats
- ✅ Normalize and aggregate pipeline
- ✅ QC metric visualization
- ✅ PTM stoichiometry inference

Citation

UPCOMING
If you use msmu in your work, please cite:

Choi and Lee et al., msmu: A Pythonic Framework for Modular Proteomics Analysis, in prep.

License

UPCOMING
MIT License. See LICENSE for details.