Metadata-Version: 2.4
Name: pylipidr
Version: 0.1.0
Summary: Pure-Python port of Bioconductor lipidr — lipidomics data analysis, normalization, differential analysis and lipid-set enrichment (LSEA).
Author-email: Zehua Zeng <starlitnightly@163.com>
License: MIT License
        
        This Python port is released under the same MIT license as the original
        Bioconductor lipidr package
        (https://bioconductor.org/packages/release/bioc/html/lipidr.html,
        by Ahmed Mohamed et al., J. Proteome Res. 2020, 19(7):2890-2897).
        
        Copyright (c) 2020 lipidr authors (original R package)
        Copyright (c) 2026 py-lipidr authors (Python port)
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/omicverse/py-lipidr
Project-URL: Repository, https://github.com/omicverse/py-lipidr
Project-URL: Issues, https://github.com/omicverse/py-lipidr/issues
Project-URL: Upstream Bioc package, https://bioconductor.org/packages/release/bioc/html/lipidr.html
Project-URL: Upstream (omicverse), https://github.com/Starlitnightly/omicverse
Keywords: lipidomics,lipidr,LSEA,lipid-set-enrichment,differential-expression,limma,Skyline,Metabolomics-Workbench,PQN,internal-standard-normalization,mass-spectrometry
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.23
Requires-Dist: scipy>=1.10
Requires-Dist: pandas>=1.5
Requires-Dist: anndata>=0.8
Requires-Dist: pygoslin>=2.0
Requires-Dist: python-limma>=0.1
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Provides-Extra: plotting
Requires-Dist: matplotlib>=3.6; extra == "plotting"
Dynamic: license-file

# py-lipidr

Pure-Python port of the Bioconductor **[lipidr](https://bioconductor.org/packages/release/bioc/html/lipidr.html)**
lipidomics analysis toolkit (Mohamed, Molendijk & Hill,
*J. Proteome Res.* 2020, 19(7):2890-2897).

`pylipidr` is a standalone, dependency-light implementation of lipidr's
**computational core**: data import, lipid-name annotation, QC,
normalization, differential analysis and Lipid Set Enrichment Analysis
(LSEA). It does not require R.

| | |
|---|---|
| PyPI / import name | `pylipidr` |
| License | MIT (same as upstream lipidr) |
| Upstream | Bioconductor lipidr 2.20.0 |

## Why this port reuses two existing engines

* **Lipid-name parsing** -> [`pygoslin`](https://github.com/lifs-tools/pygoslin),
  the reference Goslin lipid-name grammar. lipidr's regex-based name
  parser is replaced by `pygoslin`, which is more robust and standards-based.
* **Moderated-t differential analysis** -> [`python-limma`](https://pypi.org/project/python-limma/)
  (`pylimma`). lipidr's `de_analysis` calls limma under the hood in R;
  the Python port calls the published pure-Python limma port instead of
  reimplementing it.

## Install

```bash
pip install pylipidr            # once published
# or, from a checkout:
pip install -e .
```

Dependencies: `numpy`, `scipy`, `pandas`, `anndata`, `pygoslin`,
`python-limma`.

## Quick start

```python
import pylipidr as lp

# 1. read a Skyline CSV export -> a LipidomicsExperiment (AnnData-backed)
exp = lp.read_skyline("A1_data.csv")

# 2. attach clinical / sample metadata
exp = lp.add_sample_annotation(exp, "clin.csv")

# 3. collapse multiple transitions per lipid
exp = lp.summarize_transitions(exp, method="max")

# 4. QC + normalization
exp = lp.filter_by_cv(exp, cv_cutoff=20.0)
exp = lp.normalize_pqn(exp, measure="Area")        # log2 + PQN

# 5. moderated-t differential analysis (limma)
de = lp.de_analysis(exp, "HighFat - Normal", group_col="group")
hits = lp.significant_molecules(de, p_cutoff=0.05, logfc_cutoff=1.0)

# 6. Lipid Set Enrichment Analysis
enr = lp.lsea(de, rank_by="logFC")
sets = lp.significant_lipidsets(enr, p_cutoff=0.05)
```

## The `LipidomicsExperiment`

`LipidomicsExperiment` wraps an `anndata.AnnData` (samples x lipids):

* `.adata.var` -- per-lipid annotations (`Class`, `Category`,
  `total_cl`, `total_cs`, `chains`, `istd`, ...).
* `.adata.obs` -- per-sample clinical data.
* `.adata.X` / `.adata.layers` -- one or more intensity *measures*.
* processing-state flags `is_logged` / `is_normalized` / `is_summarized`
  are stored in `.adata.uns` and toggled with `set_logged` etc.

## What is ported

| lipidr (R) | pylipidr | notes |
|---|---|---|
| `LipidomicsExperiment`, `as_lipidomics_experiment` | `LipidomicsExperiment`, `as_lipidomics_experiment` | AnnData-backed |
| `read_skyline` | `read_skyline` | Skyline CSV export(s) |
| `read_mwTab` | `read_mwtab` | Metabolomics Workbench `mwTab` |
| `read_mw_datamatrix` | `read_mw_datamatrix` | MW data matrix TSV |
| `annotate_lipids` | `annotate_lipids`, `annotate_experiment` | pygoslin-backed |
| `non_parsed_molecules`, `remove_non_parsed_molecules`, `update_molecule_names` | same names | |
| `filter_by_cv` | `filter_by_cv` | CV filter |
| `impute_na` | `impute_na` | `knn` / `min` / `minDet` / `minProb` / `zero` |
| `summarize_transitions` | `summarize_transitions` | `max` / `average` |
| `normalize_pqn` | `normalize_pqn` | probabilistic quotient normalization |
| `normalize_istd` | `normalize_istd` | per-class internal-standard normalization |
| `de_design`, `de_analysis` | `de_design`, `de_analysis` | moderated-t via `pylimma` |
| `significant_molecules` | `significant_molecules` | |
| `top_lipids` | `top_lipids` | ranks DE result (see note below) |
| `gen_lipidsets` | `gen_lipidsets` | by class / chain length / unsaturation |
| `lsea` | `lsea` | preranked GSEA (fgsea-style) |
| `significant_lipidsets` | `significant_lipidsets` | |

## What is NOT ported (deferred to v0.2)

These are deliberately out of scope for v0.1 and are **documented here as
deferred**:

* **`mva`** -- PCA / PLS-DA / OPLS-DA multivariate analysis. omicverse
  already provides multivariate tooling; lipidr's `top_lipids` normally
  operates on `mva` loadings, so the v0.1 `top_lipids` instead ranks the
  `de_analysis` result.
* **All `plot_*` functions** -- `plot_samples`, `plot_molecules`,
  `plot_lipidclass`, `plot_chain_distribution`, `plot_results_volcano`,
  `plot_enrichment`, `plot_trend`, `plot_heatmap`, etc.
* **`use_interactive_graphics`** -- interactive plotly toggling.
* **`fetch_mw_study` / `list_mw_studies`** -- network helpers for the
  Metabolomics Workbench REST API.

## R-parity

`pylipidr` is validated against Bioconductor lipidr 2.20.0 on lipidr's
own bundled Skyline example dataset (`extdata/A1_data.csv` + `clin.csv`),
so both languages analyse identical input. Numbers from
`examples/benchmark.py`:

| step | metric | result |
|---|---|---|
| `annotate_lipids` | lipid-class agreement | **0.99** |
| `normalize_pqn` | Pearson r of normalized values | **1.000** |
| `normalize_istd` | Pearson r of normalized values | **0.997** |
| `de_analysis` | Pearson r of logFC | **1.000** |
| `de_analysis` | Pearson r of p-values | **1.000** |
| `lsea` | Pearson r of enrichment scores | **0.95** |
| `lsea` | Pearson r of p-values | **0.91** |

`lsea` agrees within target tolerance; small differences arise because R
lipidr's `fgsea` uses an adaptive *multilevel* permutation scheme while
`pylipidr` uses a fixed gene-permutation null. The *significantly*
enriched lipid sets agree.

Run the parity suite (skips gracefully if R is unavailable):

```bash
pytest tests/ -v
```

* `tests/test_smoke.py` -- 18 algorithmic tests, no R needed.
* `tests/test_r_parity.py` -- 8 tests vs Bioconductor lipidr.

## Benchmark

```bash
python examples/benchmark.py --runs 2
```

On the bundled example the full Python pipeline runs roughly **8x**
faster than the equivalent R pipeline (mostly by skipping Rscript /
Bioconductor startup). See `examples/compare_R_vs_Python.ipynb`.

## Citation

If you use `pylipidr`, please cite the original lipidr paper:

> Mohamed A, Molendijk J, Hill MM. **lipidr: A Software Tool for Data
> Mining and Analysis of Lipidomics Datasets.** *J. Proteome Res.* 2020,
> 19(7):2890-2897. doi:10.1021/acs.jproteome.0c00082

and, for the reused engines, the Goslin
(Kopczynski et al., *Anal. Chem.* 2020) and limma
(Ritchie et al., *Nucleic Acids Res.* 2015) papers.

## License

MIT -- the same license as upstream lipidr. See `LICENSE`.
