π€ AI Summary
This work proposes a retrieval-augmented generation framework that integrates biomedical knowledge graphs with large language models to address the lack of interpretable, multi-step reasoning in analyzing gene interactions and their downstream pathway effects. The approach constructs a heterogeneous knowledge graph from sources such as KEGG and WikiPathways, then employs subgraph retrieval and structured prompting to guide the language model in evidence-driven multi-hop reasoning. By incorporating pathway perturbation propagation simulations, the framework enables end-to-end interpretable inference from geneβgene interactions to resultant pathway state changes. To our knowledge, this is the first application of retrieval-augmented generation in this domain, yielding consistent, evidence-backed, and mechanistically transparent predictions across diverse biological contexts.
π Abstract
Understanding mechanistic relationships among genes and their impacts on biological pathways is essential for elucidating disease mechanisms and advancing precision medicine. Despite the availability of extensive molecular interaction and pathway data in public databases, integrating heterogeneous knowledge sources and enabling interpretable multi-step reasoning across biological networks remain challenging.
We present GIP-RAG (Gene Interaction Prediction through Retrieval-Augmented Generation), a computational framework that combines biomedical knowledge graphs with large language models (LLMs) to infer and interpret gene interactions. The framework constructs a unified gene interaction knowledge graph by integrating curated data from KEGG, WikiPathways, SIGNOR, Pathway Commons, and PubChem. Given user-specified genes, a query-driven module retrieves relevant subgraphs, which are incorporated into structured prompts to guide LLM-based stepwise reasoning. This enables identification of direct and indirect regulatory relationships and generation of mechanistic explanations supported by biological evidence.
Beyond pairwise interactions, GIP-RAG includes a pathway-level functional impact module that simulates propagation of gene perturbations through signaling networks and evaluates potential pathway state changes. Evaluation across diverse biological scenarios demonstrates that the framework generates consistent, interpretable, and evidence-supported insights into gene regulatory mechanisms.
Overall, GIP-RAG provides a general and interpretable approach for integrating knowledge graphs with retrieval-augmented LLMs to support mechanistic reasoning in complex molecular systems.