msmu._preprocessing._summarise._summarise
to_peptide
to_peptide(mdata, agg_method='median', calculate_q=True, score_method='best_pep', purity_threshold=0.7, top_n=None, rank_method='total_intensity', _peptide_col='peptide', _protein_col='proteins')
Summarise feature-level data to peptide-level data.
Usage
mdata = mm.pp.to_peptide( mdata, agg_method="median", calculate_q=True, score_method="best_pep", purity_threshold=0.7, )
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mdata
|
MuData
|
MuData object containing feature-level data. |
required |
agg_method
|
Literal['median', 'mean', 'sum']
|
Aggregation method for quantification to use. Defaults to "median". |
'median'
|
calculate_q
|
bool
|
Whether to calculate q-values. Defaults to True. |
True
|
score_method
|
Literal['best_pep']
|
Method to combine scores. Defaults to "best_pep". |
'best_pep'
|
purity_threshold
|
float | None
|
Purity threshold for TMT data quantification aggregation (does not filter out features). If None, no filtering is applied. Defaults to 0.7. |
0.7
|
top_n
|
int | None
|
Number of top features to consider for summarisation. If None, all features are used. Defaults to None. |
None
|
rank_method
|
Literal['total_intensity', 'max_intensity', 'median_intensity']
|
Method to rank features when selecting top_n. Defaults to "total_intensity". |
'total_intensity'
|
_peptide_col
|
str
|
Column name for peptides in var DataFrame. Defaults to "peptide". |
'peptide'
|
_protein_col
|
str
|
Column name for proteins in var DataFrame. Defaults to "proteins". |
'proteins'
|
Returns:
| Name | Type | Description |
|---|---|---|
MuData |
MuData
|
MuData object containing peptide-level data. |
to_protein
to_protein(mdata, agg_method='median', calculate_q=True, score_method='best_pep', top_n=3, rank_method='total_intensity', _protein_col='protein_group', _shared_peptide='discard')
Summarise feature-level data to protein-level data. By default, uses top 3 peptides in their total_intensity and unique (_shared_peptide = "discard") per protein_group for quantification aggregation with median.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mdata
|
MuData
|
MuData object containing feature-level data. |
required |
agg_method
|
Literal['median', 'mean', 'sum']
|
Aggregation method to use. Defaults to "median". |
'median'
|
calculate_q
|
bool
|
Whether to calculate q-values. Defaults to True. |
True
|
score_method
|
Literal['best_pep']
|
Method to combine scores (PEP). Defaults to "best_pep". |
'best_pep'
|
top_n
|
int | None
|
Number of top peptides to consider for summarisation. If None, all peptides are used. Defaults to None. |
3
|
rank_method
|
Literal['total_intensity', 'max_intensity', 'median_intensity']
|
Method to rank features when selecting top_n. Defaults to "total_intensity". |
'total_intensity'
|
_protein_col
|
str
|
Column name for proteins in var DataFrame. Defaults to "protein_group". |
'protein_group'
|
_shared_peptide
|
Literal['discard']
|
How to handle shared peptides. Currently only "discard" is implemented. Defaults to "discard". |
'discard'
|
Returns:
| Name | Type | Description |
|---|---|---|
MuData |
MuData
|
MuData object containing protein-level data. |
to_ptm
to_ptm(mdata, modi_name, modification, agg_method='median', top_n=None, rank_method='total_intensity')
Summarise feature-level data to PTM-level data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mdata
|
MuData
|
MuData object containing peptide-level data. |
required |
modi_name
|
str
|
Name of the PTM to summarise (e.g., "phospho"). Will be used in the output modality name (eg. phospho_site). |
required |
modification
|
str
|
Modification string (e.g., "[+79.96633]", "(unimod:21)"). |
required |
agg_method
|
Literal['median', 'mean', 'sum']
|
Aggregation method to use. Defaults to "median". |
'median'
|
Returns:
| Name | Type | Description |
|---|---|---|
MuData |
MuData
|
MuData object containing PTM-level data. |