GenePheno: Interpretable Gene Knockout-Induced Phenotype Abnormality Prediction from Gene Sequences

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to generalize from genomic sequences to predict multi-phenotypic abnormalities induced by gene knockouts, hindered by modality gaps, pleiotropy, and reliance on manual annotations. This paper introduces the first end-to-end interpretable framework that directly maps raw gene sequences to multi-phenotype predictions. It integrates contrastive multi-label learning, exclusive regularization, and a gene-functional bottleneck layer to jointly predict phenotypes while uncovering underlying biological mechanisms. Evaluated on four in-house datasets, the model achieves significant improvements in gene-centric F<sub>max</sub> and phenotype-centric AUC. Case studies demonstrate its ability to accurately recover known gene functional pathways. By bridging the sequence-to-phenotype mapping gap, this work establishes a novel paradigm for high-throughput, prior-free gene function interpretation.

Technology Category

Application Category

📝 Abstract
Exploring how genetic sequences shape phenotypes is a fundamental challenge in biology and a key step toward scalable, hypothesis-driven experimentation. The task is complicated by the large modality gap between sequences and phenotypes, as well as the pleiotropic nature of gene-phenotype relationships. Existing sequence-based efforts focus on the degree to which variants of specific genes alter a limited set of phenotypes, while general gene knockout induced phenotype abnormality prediction methods heavily rely on curated genetic information as inputs, which limits scalability and generalizability. As a result, the task of broadly predicting the presence of multiple phenotype abnormalities under gene knockout directly from gene sequences remains underexplored. We introduce GenePheno, the first interpretable multi-label prediction framework that predicts knockout induced phenotypic abnormalities from gene sequences. GenePheno employs a contrastive multi-label learning objective that captures inter-phenotype correlations, complemented by an exclusive regularization that enforces biological consistency. It further incorporates a gene function bottleneck layer, offering human interpretable concepts that reflect functional mechanisms behind phenotype formation. To support progress in this area, we curate four datasets with canonical gene sequences as input and multi-label phenotypic abnormalities induced by gene knockouts as targets. Across these datasets, GenePheno achieves state-of-the-art gene-centric Fmax and phenotype-centric AUC, and case studies demonstrate its ability to reveal gene functional mechanisms.
Problem

Research questions and friction points this paper is trying to address.

Predicts gene knockout-induced phenotype abnormalities from genetic sequences
Addresses modality gap between gene sequences and pleiotropic phenotypes
Provides interpretable multi-label predictions with biological consistency constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interpretable multi-label prediction from gene sequences
Contrastive learning with exclusive regularization for correlations
Gene function bottleneck layer for biological interpretability
🔎 Similar Papers
No similar papers found.
Jingquan Yan
Jingquan Yan
Unknown affiliation
Yuwei Miao
Yuwei Miao
PhD student, University of Texas at Arlington
L
Lei Yu
Quantitative Biomedical Research Center, School of Public Health, University of Texas Southwestern Medical Center
Yuzhi Guo
Yuzhi Guo
University of Texas at Arlington
Deep LearningBioinformatics
X
Xue Xiao
Quantitative Biomedical Research Center, School of Public Health, University of Texas Southwestern Medical Center
L
Lin Xu
Quantitative Biomedical Research Center, School of Public Health, University of Texas Southwestern Medical Center
Junzhou Huang
Junzhou Huang
Jenkins Garrett Professor, Computer Science and Engineering, the University of Texas at Arlington
Machine LearningMedical Image AnalysisGraph Neural NetworksComputational Toxicology