msmu._tools._dea.StatTest
NullDistribution
dataclass
NullDistribution(method, null_distribution)
Data class to store null distribution from permutation tests. Attributes: method (str): The statistical method used. null_distribution (np.ndarray): 2D array of null test statistics (shape: [n_permutations, n_features]).
add_permutation_result
add_permutation_result(other)
Add (stack) a new permutation result to the null distribution. Parameters: other : StatResult A StatResult object containing the statistic from a new permutation. Returns: NullDistribution A new NullDistribution object with the updated null distribution.
PvalueCorrection
Class for multiple testing correction methods. Methods: bh : Benjamini-Hochberg FDR correction. storey : Storey's q-value estimation with pi0 estimation. empirical : Permutation-based empirical FDR estimation.
bh
staticmethod
bh(pvals)
Benjamini-Hochberg FDR correction with NaN handling. Parameters
pvals : array-like Array of p-values (can include NaN). Returns
qvals : np.ndarray Array of q-values (NaN-filled where p was NaN).
empirical
staticmethod
empirical(stat_obs, null_dist, two_sided=True)
Permutation-based empirical FDR estimation using: - Storey's method for pi0 (default) - or permutation-statistic-based method (equation 8)
References: - https://academic.oup.com/bioinformatics/article/21/23/4280/194680 - https://www.pnas.org/doi/epdf/10.1073/pnas.1530509100
E[FDR] = pi0 * E[FP] / E[TP] E[FP] = #(FP >= s) / B (# permutation) E[TP] = #(TP >= s)
estimate_pi0_null
staticmethod
estimate_pi0_null(stat_valid, null_matrix_valid, percentile=95)
Estimate pi0 (proportion of true null hypotheses) using permutation-based statistic exceedance method. https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2013.00179/full Based on Equation (8): compares observed and null test statistic exceedances at a given threshold. pi0 = (1 - S/m) / (1 - S_star/m)
Parameters
stat_valid : np.ndarray 1D array of observed test statistics (NaN-excluded). null_matrix_valid : np.ndarray 2D array of null test statistics (shape: [n_permutations, m_valid]), aligned with stat_valid (i.e., same features, same filtering). percentile : float, default=95 Percentile value used to define the threshold for exceedance comparison.
Returns
pi0 : float Estimated proportion of true null hypotheses (clipped to [0, 1]).
estimate_pi0_storey
staticmethod
estimate_pi0_storey(p_values, lambdas=np.linspace(0.5, 0.95, 10))
Storey's estimator of pi0 (proportion of true nulls) from observed p-values. https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2013.00179/full Based on Equation (7) pi0 = #( pval > lamda ) / ( 1 - lambda ) * m
Parameters: - p_values: array of p-values (one per feature) - lambdas: array of lambda thresholds (typically 0.5 to 0.95)
Returns: - pi0: estimated pi0 value - pi0_by_lambda: array of intermediate pi0 estimates
storey
staticmethod
storey(p_values, lambda_=0.5, alpha=0.05, return_mask=False)
Storey (2002) q-value estimation with pi0 estimation.
Parameters
p_values : array-like Array of p-values (can include NaN). lambda_ : float Threshold for estimating pi0 (0 < lambda < 1). Default = 0.5. alpha : float FDR threshold for significance mask (only if return_mask=True). return_mask : bool If True, also returns Boolean significance mask.
Returns
q_values : np.ndarray Array of q-values (NaN-filled where p was NaN). rejected : Optional[np.ndarray] Boolean array indicating which features are significant under FDR < alpha.
StatResult
dataclass
StatResult(stat_method, statistic, p_value)
Data class to store statistical test results. Attributes: stat_method (str): The statistical method used. statistic (np.ndarray): Array of test statistics. p_value (np.ndarray): Array of p-values.
StatTest
Class for performing statistical tests between two groups of samples. Attributes: method (str): The statistical method to use ('welch', 'student', 'wilcoxon', 'med_diff').
calc_permutation_pvalue
staticmethod
calc_permutation_pvalue(stat_obs, null_dist)
Permutation-based empirical p-value calculation (two-sided). Parameters
stat_obs : np.ndarray 1D array of observed test statistics (one per feature). null_dist : np.ndarray 2D array of null test statistics (shape: [n_permutations, n_features]). Returns
pvals : np.ndarray Array of empirical p-values (NaN-filled where stat_obs was NaN).
median_diff
staticmethod
median_diff(ctrl, expr)
Median difference (expr - ctrl) with NaN handling. Parameters:
ctrl : array-like (n_samples_ctrl x n_features) expr : array-like (n_samples_expr x n_features) Returns:
med_diff : np.ndarray Median differences for each feature.
student
staticmethod
student(ctrl, expr)
Student's t-test with NaN handling (equal variance assumed). Not using scipy because of time complexity.
Parameters:
ctrl : array-like (n_samples_ctrl x n_features) expr : array-like (n_samples_expr x n_features)
Returns:
t_val : np.ndarray T-statistics for each feature. pval : np.ndarray Two-tailed p-values.
welch
staticmethod
welch(ctrl, expr)
Welch's t-test with NaN handling (manual implementation). Not using scipy because of time complexity.
Parameters:
ctrl : array-like (n_samples_ctrl x n_features) expr : array-like (n_samples_expr x n_features)
Returns:
t_val : np.ndarray T-statistics for each feature. pval : np.ndarray Two-tailed p-values.
wilcoxon_rank_sum
staticmethod
wilcoxon_rank_sum(ctrl, expr)
Wilcoxon rank-sum test (Mann-Whitney U test) with NaN handling. Uses scipy's ranksums function which handles NaNs internally. Parameters:
ctrl : array-like (n_samples_ctrl x n_features) expr : array-like (n_samples_expr x n_features) Returns:
stat : np.ndarray Test statistics for each feature. pval : np.ndarray Two-tailed p-values.
StatTestReusult
dataclass
StatTestReusult(statistic, ctrl, expr=None, features=None, median_ctrl=None, median_expr=None, pct_ctrl=None, pct_expr=None, log2_fc=None, p_value=None, q_value=None)
Data class to store and convert statistical test results to a DataFrame. Attributes: stat_method (str): The statistical method used. statistic (str): The statistic computed. ctrl (str | None): Control group label. expr (str | None): Experimental group label. features (pd.Index | np.ndarray | None): Feature identifiers. median_ctrl (np.ndarray | None): Median values for control group. median_expr (np.ndarray | None): Median values for experimental group. pct_ctrl (np.ndarray | None): Percentage of non-missing values in control group. pct_expr (np.ndarray | None): Percentage of non-missing values in experimental group. log2_fc (np.ndarray | None): Log2 fold changes between groups. p_value (np.ndarray | None): P-values from statistical tests. q_value (np.ndarray | None): Adjusted q-values for multiple testing.
to_df
to_df()
Convert the statistical test results to a pandas DataFrame. Returns: pd.DataFrame: DataFrame containing the statistical test results.