🤖 AI Summary
To address the high experimental cost and insufficient semantic modeling in gene–disease association (GDA) prediction, this paper proposes a heterogeneous graph learning framework integrating multi-source biological data. First, a heterogeneous graph is constructed and node features are initialized using BioGPT. Second, seven semantic meta-paths are designed, and a meta-path-aware Transformer is introduced to capture long-range dependencies. Third, a novel dual-level attention aggregation mechanism—operating both intra- and inter-meta-path—is proposed to jointly encode heterogeneous structure, semantic paths, and node features. On multiple benchmark datasets, the method achieves 3.2–5.8% AUC improvements over state-of-the-art approaches, with significantly enhanced robustness. Ablation and visualization studies validate the effectiveness of semantic modeling and cross-path feature fusion. This work is the first to synergistically integrate BioGPT embeddings with meta-path-guided Transformers for GDA prediction.
📝 Abstract
Discovering gene-disease associations is crucial for understanding disease mechanisms, yet identifying these associations remains challenging due to the time and cost of biological experiments. Computational methods are increasingly vital for efficient and scalable gene-disease association prediction. Graph-based learning models, which leverage node features and network relationships, are commonly employed for biomolecular predictions. However, existing methods often struggle to effectively integrate node features, heterogeneous structures, and semantic information. To address these challenges, we propose COmprehensive MEtapath-based heterogeneous graph Transformer(COMET) for predicting gene-disease associations. COMET integrates diverse datasets to construct comprehensive heterogeneous networks, initializing node features with BioGPT. We define seven Metapaths and utilize a transformer framework to aggregate Metapath instances, capturing global contexts and long-distance dependencies. Through intra- and inter-metapath aggregation using attention mechanisms, COMET fuses latent vectors from multiple Metapaths to enhance GDA prediction accuracy. Our method demonstrates superior robustness compared to state-of-the-art approaches. Ablation studies and visualizations validate COMET’s effectiveness, providing valuable insights for advancing human health research.