Metadata-Version: 2.4
Name: kardemumma
Version: 0.1.1
Summary: Key Analysis of Reproducible Data for Efficient Monitoring in Unified Mass Spectrometry Methods and Assays
Author-email: Thanadol Sutantiwanichkul <khunkoei@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/thanadol-git/kardemumma
Project-URL: Issue, https://github.com/thanadol-git/kardemumma/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: lxml
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: statsmodels
Requires-Dist: sdrf-pipelines[ontology]
Requires-Dist: matplotlib-venn
Requires-Dist: scikit-learn
Requires-Dist: umap-learn
Requires-Dist: nbformat
Requires-Dist: pyteomics
Dynamic: license-file

# KARDEMUMMA

**KARDEMUMMA** stands for **K**ey **A**nalysis of **R**eproducible **D**ata for **E**fficient **M**onitoring in **U**nified **M**ass **S**pectrometry **M**ethods and **A**ssays.

This repository contains the Python package for processing and quality-checking targeted mass spectrometry outputs (for example Skyline/OpenSWATH-style exports). The tool is built based on targeted proteomics assay at KTH Royal intitute of technology and Science for Life Laboratory (SciLifeLab), Sweden. The aim of this tool is to provide a simplified analysis pipeline of plasma proteomics as well as bridging research and clinical applications. 

> The repository name has now been updated from `skyline_qc` to `kardemumma` throughout the project.

## Installation

You can install the dependencies and set up the environment using [Conda](https://docs.conda.io/en/latest/):

1. Clone the repository:
   ```bash
   git clone https://github.com/thanadol-git/kardemumma.git
   cd kardemumma
   ```

2. Create the environment using the provided `environment.yml` file:
   ```bash
   conda env create -f environment.yml -p ./env
   ```

   Alternatively, if you want to use a unique environment name:
   ```bash
   conda env create -f environment.yml -n kardemumma
   ```

3. Activate the environment:
   ```bash
   conda activate ./env
   ```
   or, if you used an environment name:
   ```bash
   conda activate kardemumma
   ```

4. Install the package in editable mode:
   ```bash
   pip install -e .
   ```

## Available pipelines
- [ ] `SDRF generation for plasma proteomics`
- [ ] `Targeted PRM with ProteomEdge AB`
- [ ] `Targeted SRM`


## Requirement

### Notes
- All required dependencies will be installed via Conda and pip as specified in `environment.yml`.
- Python 3.10 is recommended.
- For pip installs, make sure you have internet access.

## To Dos

### Phase 1 — Python Package & PyPI Release (Version 0.x.x)

**1. Code & API clean-up**
- [x] -1. Move plot functions from DA4K notebook to `prm.py` and expose via `kdm.*` (`map_peptide_sequence`, `plot_peptide_concentration_by_group`, `plot_median_peptide_concentration_by_group`, `plot_all_median_peptide_concentration_by_group`, `plot_all_peptide_concentration_by_group`)
- [ ] 0. Remove 3 under-QC samples from analysis
- [ ] 1. Check with Yasset on how to set up targeted SDRF
- [ ] 2. Integrate `prm-slider` to work with transition levels
- [ ] 3. Work on SRM support (`sdrf.py` + new `srm.py` module)
- [ ] 4. Combine output layer with OpenMS formats
- [ ] 5. Audit all public functions — consistent naming, type hints, docstrings
- [ ] 6. Ensure `__init__.py` exports a clean, stable public API
- [ ] Create landing logo and banners.

**2. Package metadata & build**
- [x] 7. Update `pyproject.toml`: add missing `pyteomics` dependency, bump version, add classifiers (`python_requires`, `install_requires`) ✔️ (done)
- [x] 8. Add `CHANGELOG.md` with initial release notes (included in release & GitHub Action)
- [x] 9. Add `LICENSE` file (MIT)
- [x] 10. Verify `pip install -e .` builds cleanly in a fresh environment
- [x] 11. Build distribution: `python -m build` → inspect `dist/`

**3. Testing & CI**
- [x] 12. Add unit tests with `pytest` for core modules (`prm.py`, `sdrf.py`)
- [x] 13. Add a GitHub Actions workflow (`.github/workflows/release.yml`) for releases (tests run on push/tag)
- [x] 14. Add a release workflow that publishes docs on version tag push (`.github/workflows/release_docs.yml`); PyPI publish pending

**4. PyPI release**
- [x] 15. Register package name on [PyPI](https://pypi.org) (check availability of `kardemumma`)
- [x] 16. Create API token on PyPI and store as `PYPI_API_TOKEN` GitHub secret
- [x] 17. Publish first release: `python -m twine upload dist/*` (or via GitHub Actions)
- [x] 18. Verify: `pip install kardemumma` works from PyPI

---

### Phase 2 — Nextflow Pipeline (Version 2.x.x)

**5. Pipeline design**
- [ ] 19. Define end-to-end workflow: raw input → SDRF validation → OpenSWATH/Skyline export → PRM QC → ratio/DA output
- [ ] 20. Sketch module boundaries as Nextflow `process` blocks (one process per major step)
- [ ] 21. Decide on container strategy: Docker images (or Singularity) per process, each with `kardemumma` installed from PyPI

**6. Implementation**
- [ ] 22. Scaffold repository structure: `nextflow/`, `modules/`, `conf/`, `assets/`
- [ ] 23. Write a `main.nf` entry workflow with configurable params (`--input`, `--outdir`, `--mode prm|srm`)
- [ ] 24. Implement individual processes wrapping `kardemumma` CLI calls or Python scripts
- [ ] 25. Add `nextflow.config` with profiles: `standard` (local), `cluster` (SLURM/HPC at SciLifeLab), `cloud`
- [ ] 26. Pin `kardemumma` version in each container/environment to match tested PyPI release

**7. Testing & docs**
- [ ] 27. Add small test dataset (synthetic or anonymised) to `tests/` for end-to-end pipeline testing
- [ ] 28. Add `nf-test` or a simple CI job that runs the pipeline on the test dataset
- [ ] 29. Write pipeline usage docs in `docs/pipeline.md` (input format, params, outputs)
- [ ] 30. Consider submission to [nf-core](https://nf-co.re) once pipeline is stable


## Issues
- iRT peptides: why do they contain Biognosys sequences?
- Oxidation
- Stats for PEP

## Key developers
- Thanadol Sutantiwanichkul
- Justin Sing
- Yuqi Zheng
- Khue Hua Tran Minh
- Maria-Jesus Iglesias Mareque
- Fredrik Edfors
