🤖 AI Summary
Identifying target pathways in biological knowledge bases remains challenging due to heavy reliance on expert curation and difficulty integrating wet-lab experimental data. Method: We propose PathMamba, the first end-to-end interpretable pathway inference framework, which integrates protein language model (pLM) embeddings of amino acid sequences with a hybrid graph neural network (GNN-Mamba) architecture and introduces a trainable subgraph masking module (PathExplainer) for precise pathway localization. Contribution/Results: We establish a novel machine learning–oriented biological evaluation paradigm with domain-specific metrics. Evaluated on 301 real-world biological networks, PathMamba achieves high concordance with expert annotations (Spearman’s ρ > 0.92) for critical pathway identification, and its predictions exhibit strong biological interpretability and significance. The codebase and benchmark dataset will be publicly released.
📝 Abstract
Biological knowledge bases provide systemically functional pathways of cells or organisms in terms of molecular interaction. However, recognizing more targeted pathways, particularly when incorporating wet-lab experimental data, remains challenging and typically requires downstream biological analyses and expertise. In this paper, we frame this challenge as a solvable graph learning and explaining task and propose a novel pathway inference framework, ExPath, that explicitly integrates experimental data, specifically amino acid sequences (AA-seqs), to classify various graphs (bio-networks) in biological databases. The links (representing pathways) that contribute more to classification can be considered as targeted pathways. Technically, ExPath comprises three components: (1) a large protein language model (pLM) that encodes and embeds AA-seqs into graph, overcoming traditional obstacles in processing AA-seq data, such as BLAST; (2) PathMamba, a hybrid architecture combining graph neural networks (GNNs) with state-space sequence modeling (Mamba) to capture both local interactions and global pathway-level dependencies; and (3) PathExplainer, a subgraph learning module that identifies functionally critical nodes and edges through trainable pathway masks. We also propose ML-oriented biological evaluations and a new metric. The experiments involving 301 bio-networks evaluations demonstrate that pathways inferred by ExPath maintain biological meaningfulness. We will publicly release curated 301 bio-network data soon.