🤖 AI Summary
Integrating heterogeneous multimodal data—particularly textual annotations and numerical omics signals (e.g., scRNA-seq profiles)—remains a key challenge in computational biomedicine due to their disparate representations and semantic gaps. Method: We propose the Text-Numeric Graph (TNG), a novel graph structure that jointly encodes human-readable semantic descriptions and sample-specific numerical features of biological entities. Our approach enables end-to-end joint optimization of large language models (LLMs) and graph neural networks (GNNs), facilitating knowledge-guided and data-driven collaborative graph reasoning. We further introduce TOSG—the first standardized benchmark for text-omics signal graphs. Contribution/Results: On disease classification and signaling pathway inference tasks, TNG coupled with LLM-GNN achieves significant accuracy gains over baselines and provides interpretable identification of critical biological entities and mechanistic pathways, establishing a new paradigm for explainable AI–driven biomedical discovery.
📝 Abstract
In real-world scientific discovery, human beings always make use of the accumulated prior knowledge with imagination pick select one or a few most promising hypotheses from large and noisy data analysis results. In this study, we introduce a new type of graph structure, the text-numeric graph (TNG), which is defined as graph entities and associations have both text-attributed information and numeric information. The TNG is an ideal data structure model for novel scientific discovery via graph reasoning because it integrates human-understandable textual annotations or prior knowledge, with numeric values that represent the observed or activation levels of graph entities or associations in different samples. Together both the textual information and numeric values determine the importance of graph entities and associations in graph reasoning for novel scientific knowledge discovery. We further propose integrating large language models (LLMs) and graph neural networks (GNNs) to analyze the TNGs for graph understanding and reasoning. To demonstrate the utility, we generated the text-omic(numeric) signaling graphs (TOSG), as one type of TNGs, in which all graphs have the same entities, associations and annotations, but have sample-specific entity numeric (omic) values using single cell RNAseq (scRNAseq) datasets of different diseases. We proposed joint LLM-GNN models for key entity mining and signaling pathway mining on the TOSGs. The evaluation results showed the LLM-GNN and TNGs models significantly improve classification accuracy and network inference. In conclusion, the TNGs and joint LLM-GNN models are important approaches for scientific discovery.