Skip to content

msmu._preprocessing._summarise._summarise

to_peptide

to_peptide(mdata, agg_method='median', calculate_q=True, score_method='best_pep', purity_threshold=0.7, top_n=None, rank_method='total_intensity', _peptide_col='peptide', _protein_col='proteins')

Summarise feature-level data to peptide-level data.

Usage

mdata = mm.pp.to_peptide( mdata, agg_method="median", calculate_q=True, score_method="best_pep", purity_threshold=0.7, )

Parameters:

Name Type Description Default
mdata MuData

MuData object containing feature-level data.

required
agg_method Literal['median', 'mean', 'sum']

Aggregation method for quantification to use. Defaults to "median".

'median'
calculate_q bool

Whether to calculate q-values. Defaults to True.

True
score_method Literal['best_pep']

Method to combine scores. Defaults to "best_pep".

'best_pep'
purity_threshold float | None

Purity threshold for TMT data quantification aggregation (does not filter out features). If None, no filtering is applied. Defaults to 0.7.

0.7
top_n int | None

Number of top features to consider for summarisation. If None, all features are used. Defaults to None.

None
rank_method Literal['total_intensity', 'max_intensity', 'median_intensity']

Method to rank features when selecting top_n. Defaults to "total_intensity".

'total_intensity'
_peptide_col str

Column name for peptides in var DataFrame. Defaults to "peptide".

'peptide'
_protein_col str

Column name for proteins in var DataFrame. Defaults to "proteins".

'proteins'

Returns:

Name Type Description
MuData MuData

MuData object containing peptide-level data.

to_protein

to_protein(mdata, agg_method='median', calculate_q=True, score_method='best_pep', top_n=3, rank_method='total_intensity', _protein_col='protein_group', _shared_peptide='discard')

Summarise feature-level data to protein-level data. By default, uses top 3 peptides in their total_intensity and unique (_shared_peptide = "discard") per protein_group for quantification aggregation with median.

Parameters:

Name Type Description Default
mdata MuData

MuData object containing feature-level data.

required
agg_method Literal['median', 'mean', 'sum']

Aggregation method to use. Defaults to "median".

'median'
calculate_q bool

Whether to calculate q-values. Defaults to True.

True
score_method Literal['best_pep']

Method to combine scores (PEP). Defaults to "best_pep".

'best_pep'
top_n int | None

Number of top peptides to consider for summarisation. If None, all peptides are used. Defaults to None.

3
rank_method Literal['total_intensity', 'max_intensity', 'median_intensity']

Method to rank features when selecting top_n. Defaults to "total_intensity".

'total_intensity'
_protein_col str

Column name for proteins in var DataFrame. Defaults to "protein_group".

'protein_group'
_shared_peptide Literal['discard']

How to handle shared peptides. Currently only "discard" is implemented. Defaults to "discard".

'discard'

Returns:

Name Type Description
MuData MuData

MuData object containing protein-level data.

to_ptm

to_ptm(mdata, modi_name, modification, agg_method='median', top_n=None, rank_method='total_intensity')

Summarise feature-level data to PTM-level data.

Parameters:

Name Type Description Default
mdata MuData

MuData object containing peptide-level data.

required
modi_name str

Name of the PTM to summarise (e.g., "phospho"). Will be used in the output modality name (eg. phospho_site).

required
modification str

Modification string (e.g., "[+79.96633]", "(unimod:21)").

required
agg_method Literal['median', 'mean', 'sum']

Aggregation method to use. Defaults to "median".

'median'

Returns:

Name Type Description
MuData MuData

MuData object containing PTM-level data.