π€ AI Summary
This work proposes LaCoGSEA, a novel framework that integrates deep autoencoders with gene set enrichment analysis (GSEA) to address the limitations of traditional pathway enrichment methods in unsupervised settings. Conventional approaches rely on predefined phenotypic labels and are thus ill-suited for label-free scenarios, while existing unsupervised methods often assume linearity and fail to explicitly model geneβpathway relationships. LaCoGSEA overcomes these issues by leveraging an autoencoder to capture the nonlinear manifold of transcriptomic data and generating label-free gene rankings based on global correlations between genes and latent variables. These rankings drive a GSEA-like enrichment statistic without requiring phenotype annotations. Evaluated on cancer subtype clustering tasks, LaCoGSEA significantly outperforms current unsupervised baselines, recovers more high-confidence biological pathways, and demonstrates robust performance across varying data scales and experimental conditions, establishing a new state of the art in unsupervised pathway enrichment analysis.
π Abstract
Motivation: Pathway enrichment analysis is widely used to interpret gene expression data. Standard approaches, such as GSEA, rely on predefined phenotypic labels and pairwise comparisons, which limits their applicability in unsupervised settings. Existing unsupervised extensions, including single-sample methods, provide pathway-level summaries but primarily capture linear relationships and do not explicitly model gene-pathway associations. More recently, deep learning models have been explored to capture non-linear transcriptomic structure. However, their interpretation has typically relied on generic explainable AI (XAI) techniques designed for feature-level attribution. As these methods are not designed for pathway-level interpretation in unsupervised transcriptomic analyses, their effectiveness in this setting remains limited. Results: To bridge this gap, we introduce LaCoGSEA (Latent Correlation GSEA), an unsupervised framework that integrates deep representation learning with robust pathway statistics. LaCoGSEA employs an autoencoder to capture non-linear manifolds and proposes a global gene-latent correlation metric as a proxy for differential expression, generating dense gene rankings without prior labels. We demonstrate that LaCoGSEA offers three key advantages: (i) it achieves improved clustering performance in distinguishing cancer subtypes compared to existing unsupervised baselines; (ii) it recovers a broader range of biologically meaningful pathways at higher ranks compared with linear dimensionality reduction and gradient-based XAI methods; and (iii) it maintains high robustness and consistency across varying experimental protocols and dataset sizes. Overall, LaCoGSEA provides state-of-the-art performance in unsupervised pathway enrichment analysis. Availability and implementation: https://github.com/willyzzz/LaCoGSEA