Learning to Discover Regulatory Elements for Gene Expression Prediction

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the end-to-end prediction of gene expression from DNA sequences, aiming to automatically identify key regulatory elements driving expression. We propose a causally aware information bottleneck framework that models information compression via the Beta distribution, jointly decomposes epigenetic signals and DNA sequence inputs, and explicitly characterizes the causal activity of regulatory elements—thereby effectively filtering non-causal noise. Integrating deep sequence modeling with an interpretable localization mechanism, our method achieves precise identification of regulatory regions. On multiple benchmark datasets, our approach significantly outperforms state-of-the-art baselines in gene expression prediction accuracy. Moreover, the identified regulatory regions exhibit superior biological consistency and functional enrichment compared to those detected by statistical peak-calling methods (e.g., MACS3). Our work establishes an interpretable, causally grounded computational paradigm for deciphering non-coding genomic function.

Technology Category

Application Category

📝 Abstract
We consider the problem of predicting gene expressions from DNA sequences. A key challenge of this task is to find the regulatory elements that control gene expressions. Here, we introduce Seq2Exp, a Sequence to Expression network explicitly designed to discover and extract regulatory elements that drive target gene expression, enhancing the accuracy of the gene expression prediction. Our approach captures the causal relationship between epigenomic signals, DNA sequences and their associated regulatory elements. Specifically, we propose to decompose the epigenomic signals and the DNA sequence conditioned on the causal active regulatory elements, and apply an information bottleneck with the Beta distribution to combine their effects while filtering out non-causal components. Our experiments demonstrate that Seq2Exp outperforms existing baselines in gene expression prediction tasks and discovers influential regions compared to commonly used statistical methods for peak detection such as MACS3. The source code is released as part of the AIRS library (https://github.com/divelab/AIRS/).
Problem

Research questions and friction points this paper is trying to address.

Predicting gene expressions from DNA sequences
Discovering regulatory elements controlling gene expressions
Enhancing accuracy with Seq2Exp network
Innovation

Methods, ideas, or system contributions that make the work stand out.

Seq2Exp network
causal regulatory elements
information bottleneck Beta
🔎 Similar Papers
No similar papers found.
X
Xingyu Su
Texas A&M University
H
Haiyang Yu
Texas A&M University
Degui Zhi
Degui Zhi
Department Chair, Professor, University of Texas Health Science Center at Houston
EHRImaging geneticsPopulation Genetics Informatics
S
Shuiwang Ji
Texas A&M University