Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address weak generalization, poor interpretability, and degraded out-of-distribution (OOD) performance in transcription factor binding site (TFBS) prediction, this paper proposes an interpretable deep learning framework based on a Mixture of Experts (MoE). The framework integrates multiple pre-trained convolutional neural network (CNN) experts and adaptively fuses their outputs via learned weights, substantially enhancing robustness on both in-distribution and OOD data. Additionally, we introduce ShiftSmooth, a novel attribution mapping method that mitigates the noise sensitivity of conventional gradient-based approaches, enabling high-resolution, stable motif localization and interpretation. Experiments demonstrate that the MoE model outperforms single-model baselines across diverse TFBS datasets, achieving an average 4.2% AUC improvement under OOD conditions. ShiftSmooth significantly surpasses baseline methods in motif detection accuracy and spatial localization consistency, validating its effectiveness and practical utility for deciphering gene regulatory mechanisms.

Technology Category

Application Category

📝 Abstract
Transcription Factor Binding Site (TFBS) prediction is crucial for understanding gene regulation and various biological processes. This study introduces a novel Mixture of Experts (MoE) approach for TFBS prediction, integrating multiple pre-trained Convolutional Neural Network (CNN) models, each specializing in different TFBS patterns. We evaluate the performance of our MoE model against individual expert models on both in-distribution and out-of-distribution (OOD) datasets, using six randomly selected transcription factors (TFs) for OOD testing. Our results demonstrate that the MoE model achieves competitive or superior performance across diverse TF binding sites, particularly excelling in OOD scenarios. The Analysis of Variance (ANOVA) statistical test confirms the significance of these performance differences. Additionally, we introduce ShiftSmooth, a novel attribution mapping technique that provides more robust model interpretability by considering small shifts in input sequences. Through comprehensive explainability analysis, we show that ShiftSmooth offers superior attribution for motif discovery and localization compared to traditional Vanilla Gradient methods. Our work presents an efficient, generalizable, and interpretable solution for TFBS prediction, potentially enabling new discoveries in genome biology and advancing our understanding of transcriptional regulation.
Problem

Research questions and friction points this paper is trying to address.

Predicting Transcription Factor Binding Sites (TFBS) accurately and interpretably
Enhancing model generalizability for out-of-distribution TFBS scenarios
Improving motif discovery via robust attribution mapping techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Experts integrates multiple CNN models
ShiftSmooth enhances interpretability with input shifts
ANOVA confirms MoE model performance significance
🔎 Similar Papers
A
Aakash Tripathi
Machine Learning, Moffitt Cancer Center, 12902 USF Magnolia Drive, Tampa, FL, 33612, USA
I
Ian E. Nielsen
Department of Electrical and Computer Engineering, Rowan University, Glassboro, NJ, 08028
Muhammad Umer
Muhammad Umer
Stanford University
wireless communicationscommunication theory6Gmachine learning
Ravi P. Ramachandran
Ravi P. Ramachandran
Professor of Electrical and Computer Engineering, Rowan University
Digital signal processingspeech processingpattern recognitionmachine learningartificial intelligence
G
Ghulam Rasool
Machine Learning, Moffitt Cancer Center, 12902 USF Magnolia Drive, Tampa, FL, 33612, USA