๐ค AI Summary
Biological molecular mechanism reasoning requires multi-step logical inference, yet existing large language models (LLMs) often fail due to insufficient domain knowledge and logical inconsistency. To address this, we propose KG-Chain, a knowledge-enhanced long-chain-of-thought generation framework that tightly integrates biomedical knowledge graphs (KGs) with LLMs: KG-guided multi-hop reasoning path construction and pruning ensure biologically grounded inference; supervised fine-tuning and reinforcement learning jointly optimize reasoning reliability; and a novel long-chain-of-thought prompting mechanism explicitly structures extended reasoning trajectories. We further introduce PrimeKGQAโthe first large-scale, expert-annotated benchmark for multi-hop molecular biology question answering. Experiments demonstrate that KG-Chain achieves significant improvements over state-of-the-art methods on deep multi-hop reasoning tasks, attaining new SOTA performance in both factual accuracy and logical consistency.
๐ Abstract
Understanding complex biomolecular mechanisms requires multi-step reasoning across molecular interactions, signaling cascades, and metabolic pathways. While large language models(LLMs) show promise in such tasks, their application to biomolecular problems is hindered by logical inconsistencies and the lack of grounding in domain knowledge. Existing approaches often exacerbate these issues: reasoning steps may deviate from biological facts or fail to capture long mechanistic dependencies. To address these challenges, we propose a Knowledge-Augmented Long-CoT Reasoning framework that integrates LLMs with knowledge graph-based multi-hop reasoning chains. The framework constructs mechanistic chains via guided multi-hop traversal and pruning on the knowledge graph; these chains are then incorporated into supervised fine-tuning to improve factual grounding and further refined with reinforcement learning to enhance reasoning reliability and consistency. Furthermore, to overcome the shortcomings of existing benchmarks, which are often restricted in scale and scope and lack annotations for deep reasoning chains, we introduce PrimeKGQA, a comprehensive benchmark for biomolecular question answering. Experimental results on both PrimeKGQA and existing datasets demonstrate that although larger closed-source models still perform well on relatively simple tasks, our method demonstrates clear advantages as reasoning depth increases, achieving state-of-the-art performance on multi-hop tasks that demand traversal of structured biological knowledge. These findings highlight the effectiveness of combining structured knowledge with advanced reasoning strategies for reliable and interpretable biomolecular reasoning.