{{ cov }} | {% endfor %}Number of Samples |
---|---|
{{ row[cov] }} | {% endfor %}{{ row["Number of Samples"] }} |
Principal Component Analysis (PCA) helps to identify the main sources of variation in the data. We perform PCA before and after batch effect correction to see if the batch effects dominate the first few principal components (PCs). The variance explained by each PC indicates how much of the data's variation is captured by that PC. After correction, the variance should ideally be more evenly distributed across PCs.
Variance explained by each principal component before and after batch effect correction. Lower variance in the first few PCs suggests successful batch effect correction.
- High correlation between PCs and batch information before correction indicates batch effects. Lower correlation after correction suggests successful batch effect removal.
- However, it is important to note that even after batch correction, a strong association between datasets and PCs might still persist, especially in cases where:
In these scenarios, the observed associations are expected and reflect meaningful biological or experimental differences rather than technical artifacts.
Results format: statistics, p-value, test performed, and number of samples used.
{{ association_matrix_before | safe }} {% endif %}This metric quantifies the impact of the batch effect correction on the variability of gene expression data. The metric is calculated as follows:
This ratio helps quantify how much variability remains after correction.
The following boxplots compare gene expression across datasets (batches) before and after correction for different covariate combinations.
These plots help assess whether the correction process has successfully reduced batch-related variability without masking important biological differences.
The Silhouette Score measures how similar each sample is to its own batch compared to other batches. A high score before correction indicates strong batch effects. A lower score after correction means these effects were reduced.
- A decrease in the Silhouette Score after correction suggests successful batch effect mitigation.
The Entropy of Batch Mixing (EBM) measures how well samples from different batches are mixed after correction. Higher entropy indicates better mixing, meaning batch effects have been reduced.
- An increase in entropy after correction indicates improved mixing of batches, suggesting successful batch effect correction.
{{ cov }} | {% endfor %}Proportion of Mixed Samples |
---|---|
{{ row["cov" ~ loop.index] }} | {% endfor %}{{ row.proportion | percentage }} |
Confidence is very high in the batch effect correction due to a substantial proportion of mixed datasets and samples. This suggests that the correction algorithm was applied across a highly diverse set of conditions, minimizing the risk that batch effects confound the biological signals. The variability across different conditions was well-represented, leading to more reliable results.
Confidence is high in the batch effect correction due to a substantial proportion of mixed datasets and samples. This indicates that the correction algorithm was applied across a diverse range of conditions, reducing the likelihood that batch effects are confounded with biological signals. A higher representation of mixed datasets means that the variability across different conditions was well-represented during the correction, leading to more reliable and robust results.
Confidence is moderate in the batch effect correction. There is a reasonable proportion of mixed datasets and samples, suggesting that the correction was performed on a fairly diverse dataset. However, there's still a possibility that some batch effects might not have been fully corrected if certain covariate combinations were underrepresented. While the results are likely to be reliable, some caution is advised in interpreting the findings.
Confidence is low in the batch effect correction. The mixed datasets and samples form a small proportion of the cohort, which indicates that the correction may have been applied under limited conditions. This can lead to insufficient representation of the variability across different conditions, increasing the risk that batch effects may still confound the biological signals. In such cases, the reliability of the corrected data could be compromised, and further validation might be necessary.