Skip to content

msmu._tools._dea.StatTest

NullDistribution dataclass

NullDistribution(method, null_distribution)

Data class to store null distribution from permutation tests. Attributes: method (str): The statistical method used. null_distribution (np.ndarray): 2D array of null test statistics (shape: [n_permutations, n_features]).

add_permutation_result

add_permutation_result(other)

Add (stack) a new permutation result to the null distribution. Parameters: other : StatResult A StatResult object containing the statistic from a new permutation. Returns: NullDistribution A new NullDistribution object with the updated null distribution.

PvalueCorrection

Class for multiple testing correction methods. Methods: bh : Benjamini-Hochberg FDR correction. storey : Storey's q-value estimation with pi0 estimation. empirical : Permutation-based empirical FDR estimation.

bh staticmethod

bh(pvals)

Benjamini-Hochberg FDR correction with NaN handling. Parameters


pvals : array-like Array of p-values (can include NaN). Returns


qvals : np.ndarray Array of q-values (NaN-filled where p was NaN).

empirical staticmethod

empirical(stat_obs, null_dist, two_sided=True)

Permutation-based empirical FDR estimation using: - Storey's method for pi0 (default) - or permutation-statistic-based method (equation 8)

References: - https://academic.oup.com/bioinformatics/article/21/23/4280/194680 - https://www.pnas.org/doi/epdf/10.1073/pnas.1530509100

E[FDR] = pi0 * E[FP] / E[TP] E[FP] = #(FP >= s) / B (# permutation) E[TP] = #(TP >= s)

estimate_pi0_null staticmethod

estimate_pi0_null(stat_valid, null_matrix_valid, percentile=95)

Estimate pi0 (proportion of true null hypotheses) using permutation-based statistic exceedance method. https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2013.00179/full Based on Equation (8): compares observed and null test statistic exceedances at a given threshold. pi0 = (1 - S/m) / (1 - S_star/m)

Parameters

stat_valid : np.ndarray 1D array of observed test statistics (NaN-excluded). null_matrix_valid : np.ndarray 2D array of null test statistics (shape: [n_permutations, m_valid]), aligned with stat_valid (i.e., same features, same filtering). percentile : float, default=95 Percentile value used to define the threshold for exceedance comparison.

Returns

pi0 : float Estimated proportion of true null hypotheses (clipped to [0, 1]).

estimate_pi0_storey staticmethod

estimate_pi0_storey(p_values, lambdas=np.linspace(0.5, 0.95, 10))

Storey's estimator of pi0 (proportion of true nulls) from observed p-values. https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2013.00179/full Based on Equation (7) pi0 = #( pval > lamda ) / ( 1 - lambda ) * m

Parameters: - p_values: array of p-values (one per feature) - lambdas: array of lambda thresholds (typically 0.5 to 0.95)

Returns: - pi0: estimated pi0 value - pi0_by_lambda: array of intermediate pi0 estimates

storey staticmethod

storey(p_values, lambda_=0.5, alpha=0.05, return_mask=False)

Storey (2002) q-value estimation with pi0 estimation.

Parameters

p_values : array-like Array of p-values (can include NaN). lambda_ : float Threshold for estimating pi0 (0 < lambda < 1). Default = 0.5. alpha : float FDR threshold for significance mask (only if return_mask=True). return_mask : bool If True, also returns Boolean significance mask.

Returns

q_values : np.ndarray Array of q-values (NaN-filled where p was NaN). rejected : Optional[np.ndarray] Boolean array indicating which features are significant under FDR < alpha.

StatResult dataclass

StatResult(stat_method, statistic, p_value)

Data class to store statistical test results. Attributes: stat_method (str): The statistical method used. statistic (np.ndarray): Array of test statistics. p_value (np.ndarray): Array of p-values.

StatTest

Class for performing statistical tests between two groups of samples. Attributes: method (str): The statistical method to use ('welch', 'student', 'wilcoxon', 'med_diff').

calc_permutation_pvalue staticmethod

calc_permutation_pvalue(stat_obs, null_dist)

Permutation-based empirical p-value calculation (two-sided). Parameters


stat_obs : np.ndarray 1D array of observed test statistics (one per feature). null_dist : np.ndarray 2D array of null test statistics (shape: [n_permutations, n_features]). Returns


pvals : np.ndarray Array of empirical p-values (NaN-filled where stat_obs was NaN).

median_diff staticmethod

median_diff(ctrl, expr)

Median difference (expr - ctrl) with NaN handling. Parameters:


ctrl : array-like (n_samples_ctrl x n_features) expr : array-like (n_samples_expr x n_features) Returns:


med_diff : np.ndarray Median differences for each feature.

student staticmethod

student(ctrl, expr)

Student's t-test with NaN handling (equal variance assumed). Not using scipy because of time complexity.

Parameters:

ctrl : array-like (n_samples_ctrl x n_features) expr : array-like (n_samples_expr x n_features)

Returns:

t_val : np.ndarray T-statistics for each feature. pval : np.ndarray Two-tailed p-values.

welch staticmethod

welch(ctrl, expr)

Welch's t-test with NaN handling (manual implementation). Not using scipy because of time complexity.

Parameters:

ctrl : array-like (n_samples_ctrl x n_features) expr : array-like (n_samples_expr x n_features)

Returns:

t_val : np.ndarray T-statistics for each feature. pval : np.ndarray Two-tailed p-values.

wilcoxon_rank_sum staticmethod

wilcoxon_rank_sum(ctrl, expr)

Wilcoxon rank-sum test (Mann-Whitney U test) with NaN handling. Uses scipy's ranksums function which handles NaNs internally. Parameters:


ctrl : array-like (n_samples_ctrl x n_features) expr : array-like (n_samples_expr x n_features) Returns:


stat : np.ndarray Test statistics for each feature. pval : np.ndarray Two-tailed p-values.

StatTestReusult dataclass

StatTestReusult(statistic, ctrl, expr=None, features=None, median_ctrl=None, median_expr=None, pct_ctrl=None, pct_expr=None, log2_fc=None, p_value=None, q_value=None)

Data class to store and convert statistical test results to a DataFrame. Attributes: stat_method (str): The statistical method used. statistic (str): The statistic computed. ctrl (str | None): Control group label. expr (str | None): Experimental group label. features (pd.Index | np.ndarray | None): Feature identifiers. median_ctrl (np.ndarray | None): Median values for control group. median_expr (np.ndarray | None): Median values for experimental group. pct_ctrl (np.ndarray | None): Percentage of non-missing values in control group. pct_expr (np.ndarray | None): Percentage of non-missing values in experimental group. log2_fc (np.ndarray | None): Log2 fold changes between groups. p_value (np.ndarray | None): P-values from statistical tests. q_value (np.ndarray | None): Adjusted q-values for multiple testing.

to_df

to_df()

Convert the statistical test results to a pandas DataFrame. Returns: pd.DataFrame: DataFrame containing the statistical test results.