Adversarial Domain Adaptation Enables Knowledge Transfer Across Heterogeneous RNA-Seq Datasets

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This study addresses the challenge of limited cross-dataset knowledge transfer in RNA-seq analysis, which arises from heterogeneous preprocessing pipelines and phenotypic label inconsistencies, thereby hindering the generalization of deep learning models in low-data regimes. To overcome this, the authors introduce adversarial domain adaptation to transcriptomics for the first time, proposing a deep learning framework that jointly optimizes classification and domain alignment objectives. The method enables effective knowledge transfer between large-scale source datasets (TCGA, ARCHS4, GTEx) and small target datasets under both supervised and unsupervised settings. By constructing a domain-invariant latent space and incorporating tailored regularization strategies, the approach significantly outperforms non-adaptive baselines in cancer and tissue-type classification tasks, particularly excelling in data-scarce scenarios, thus demonstrating the efficacy of domain adaptation in enhancing model robustness and generalizability.

Technology Category

Application Category

📝 Abstract

Accurate phenotype prediction from RNA sequencing (RNA-seq) data is essential for diagnosis, biomarker discovery, and personalized medicine. Deep learning models have demonstrated strong potential to outperform classical machine learning approaches, but their performance relies on large, well-annotated datasets. In transcriptomics, such datasets are frequently limited, leading to over-fitting and poor generalization. Knowledge transfer from larger, more general datasets can alleviate this issue. However, transferring information across RNA-seq datasets remains challenging due to heterogeneous preprocessing pipelines and differences in target phenotypes. In this study, we propose a deep learning-based domain adaptation framework that enables effective knowledge transfer from a large general dataset to a smaller one for cancer type classification. The method learns a domain-invariant latent space by jointly optimizing classification and domain alignment objectives. To ensure stable training and robustness in data-scarce scenarios, the framework is trained with an adversarial approach with appropriate regularization. Both supervised and unsupervised approach variants are explored, leveraging labeled or unlabeled target samples. The framework is evaluated on three large-scale transcriptomic datasets (TCGA, ARCHS4, GTEx) to assess its ability to transfer knowledge across cohorts. Experimental results demonstrate consistent improvements in cancer and tissue type classification accuracy compared to non-adaptive baselines, particularly in low-data scenarios. Overall, this work highlights domain adaptation as a powerful strategy for data-efficient knowledge transfer in transcriptomics, enabling robust phenotype prediction under constrained data conditions.

Problem

Research questions and friction points this paper is trying to address.

RNA-seq

domain adaptation

knowledge transfer

heterogeneous datasets

phenotype prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial domain adaptation

knowledge transfer

RNA-seq