Metadata-Version: 2.4
Name: PyquetMS
Version: 0.1.0
Summary: Memory-efficient mzML to Parquet converter for mass spectrometry files
Author-email: Avni Badiwale <avnibadiwale@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Avni2000/pyquet
Keywords: mass spectrometry,mzML,parquet,proteomics,metabolomics
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: pyarrow>=5.0.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: mypy; extra == "dev"

# Pyquet

Memory-efficient mzML to Parquet converter for mass spectrometry files.

## Overview

Pyquet provides streaming conversion of mzML files to Parquet format with minimal memory usage, making it suitable for processing large mass spectrometry datasets without running out of memory. This project was originally developed as a side project inspired by GSoC 25' with OpenMS, with the goal of providing a simple CLI for converting .mzML to .parquet files, which is especially important in big data projects (e.g., machine learning).

## Installation

### From PyPI

```bash
pip install pyquet
```

### From source

```bash
git clone https://github.com/Avni2000/pyquet.git
cd pyquet
pip install .
```

### Development installation

```bash
git clone https://github.com/Avni2000/pyquet.git
cd pyquet
pip install -e ".[dev]"
```

## Usage

### CLI

Basic conversion:
```bash
pyquet input.mzML
```
or
```bash
pyquet ~/Downloads/input.mzML
```

Specify output file (defaults to working directory):
```bash
pyquet input.mzML -o output.parquet
```

Customize batch size and compression. I recommend :
```bash
pyquet input.mzML --batch-size 5000 --compression gzip
```

Get file information without converting:
```bash
pyquet input.mzML --info
```

## Output Format

The converted Parquet files contain the following columns:

Depending on the type of mzml file, we have slightly different columns. 
Some columns may be blank, which is perfectly okay! It doesn't mean your mzml is wrong. 
The main expected values are time, m/z, and intensity

## Contributions

It's quite a small project, feel free to make a PR or open an issue!
