ExPath: Towards Explaining Targeted Pathways for Biological Knowledge Bases

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Identifying target pathways in biological knowledge bases remains challenging due to heavy reliance on expert curation and difficulty integrating wet-lab experimental data. Method: We propose PathMamba, the first end-to-end interpretable pathway inference framework, which integrates protein language model (pLM) embeddings of amino acid sequences with a hybrid graph neural network (GNN-Mamba) architecture and introduces a trainable subgraph masking module (PathExplainer) for precise pathway localization. Contribution/Results: We establish a novel machine learning–oriented biological evaluation paradigm with domain-specific metrics. Evaluated on 301 real-world biological networks, PathMamba achieves high concordance with expert annotations (Spearman’s ρ > 0.92) for critical pathway identification, and its predictions exhibit strong biological interpretability and significance. The codebase and benchmark dataset will be publicly released.

Technology Category

Application Category

📝 Abstract

Biological knowledge bases provide systemically functional pathways of cells or organisms in terms of molecular interaction. However, recognizing more targeted pathways, particularly when incorporating wet-lab experimental data, remains challenging and typically requires downstream biological analyses and expertise. In this paper, we frame this challenge as a solvable graph learning and explaining task and propose a novel pathway inference framework, ExPath, that explicitly integrates experimental data, specifically amino acid sequences (AA-seqs), to classify various graphs (bio-networks) in biological databases. The links (representing pathways) that contribute more to classification can be considered as targeted pathways. Technically, ExPath comprises three components: (1) a large protein language model (pLM) that encodes and embeds AA-seqs into graph, overcoming traditional obstacles in processing AA-seq data, such as BLAST; (2) PathMamba, a hybrid architecture combining graph neural networks (GNNs) with state-space sequence modeling (Mamba) to capture both local interactions and global pathway-level dependencies; and (3) PathExplainer, a subgraph learning module that identifies functionally critical nodes and edges through trainable pathway masks. We also propose ML-oriented biological evaluations and a new metric. The experiments involving 301 bio-networks evaluations demonstrate that pathways inferred by ExPath maintain biological meaningfulness. We will publicly release curated 301 bio-network data soon.

Problem

Research questions and friction points this paper is trying to address.

Targeted pathway recognition

Integration of experimental data

Graph learning and explaining task

Innovation

Methods, ideas, or system contributions that make the work stand out.

Protein language model embedding

Hybrid GNN-Mamba architecture

Trainable pathway mask module

🔎 Similar Papers

No similar papers found.