itpseq.Sample.get_counts#

Sample.get_counts(pos=None, **kwargs)[source]#

Counts the number of reads for each motif or combination of amino-acid/position for each replicate in the sample.

Parameters:
  • pos (str, optional) – Position to consider when counting the reads. If None is passed, then this returns a DataFrame with the counts of each amino-acid per position.

  • kwargs (optional) – Optional parameters to pass to load_data (min_peptide, max_peptide, how, limit, sample).

Returns:

Returns a DataFrame. If pos is None the columns will be a MultiIndex.

Return type:

DataFrame

Examples

Count the number of reads for each amino-acid/position combination
>>> sample.get_counts()
     sample.1                        ...  sample.3
           -8         -7         -6  ...        -1         0         1
    2879961.0  2658485.0  2449526.0  ...  724998.0   34748.0       NaN
*         NaN        NaN        NaN  ...       NaN       NaN  880568.0
A         NaN    12240.0    25225.0  ...   92225.0  115164.0   85132.0
..        ...        ...        ...  ...       ...       ...       ...
W         NaN     2686.0     5059.0  ...   14313.0   23730.0   17656.0
Y         NaN     9522.0    19296.0  ...   57431.0   69162.0   81430.0
m    197624.0   221476.0   208959.0  ...  375644.0  690250.0   34748.0
[23 rows x 30 columns]
Count the number of reads for each motif in the E-P-A sites
>>> sample.get_counts(pos='E:A')
     sample.1  sample.2  sample.3
 m*  254850.0  107060.0  258338.0
 mS   54993.0   20419.0   50959.0
  m   52640.0   17860.0   34748.0
 ..        ...       ...       ...
WFW       NaN       2.0       NaN
WWW       NaN       1.0       NaN
MMW       NaN       NaN       1.0
[8842 rows x 3 columns]