Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning

๐Ÿ“… 2025-11-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Biological molecular mechanism reasoning requires multi-step logical inference, yet existing large language models (LLMs) often fail due to insufficient domain knowledge and logical inconsistency. To address this, we propose KG-Chain, a knowledge-enhanced long-chain-of-thought generation framework that tightly integrates biomedical knowledge graphs (KGs) with LLMs: KG-guided multi-hop reasoning path construction and pruning ensure biologically grounded inference; supervised fine-tuning and reinforcement learning jointly optimize reasoning reliability; and a novel long-chain-of-thought prompting mechanism explicitly structures extended reasoning trajectories. We further introduce PrimeKGQAโ€”the first large-scale, expert-annotated benchmark for multi-hop molecular biology question answering. Experiments demonstrate that KG-Chain achieves significant improvements over state-of-the-art methods on deep multi-hop reasoning tasks, attaining new SOTA performance in both factual accuracy and logical consistency.

Technology Category

Application Category

๐Ÿ“ Abstract
Understanding complex biomolecular mechanisms requires multi-step reasoning across molecular interactions, signaling cascades, and metabolic pathways. While large language models(LLMs) show promise in such tasks, their application to biomolecular problems is hindered by logical inconsistencies and the lack of grounding in domain knowledge. Existing approaches often exacerbate these issues: reasoning steps may deviate from biological facts or fail to capture long mechanistic dependencies. To address these challenges, we propose a Knowledge-Augmented Long-CoT Reasoning framework that integrates LLMs with knowledge graph-based multi-hop reasoning chains. The framework constructs mechanistic chains via guided multi-hop traversal and pruning on the knowledge graph; these chains are then incorporated into supervised fine-tuning to improve factual grounding and further refined with reinforcement learning to enhance reasoning reliability and consistency. Furthermore, to overcome the shortcomings of existing benchmarks, which are often restricted in scale and scope and lack annotations for deep reasoning chains, we introduce PrimeKGQA, a comprehensive benchmark for biomolecular question answering. Experimental results on both PrimeKGQA and existing datasets demonstrate that although larger closed-source models still perform well on relatively simple tasks, our method demonstrates clear advantages as reasoning depth increases, achieving state-of-the-art performance on multi-hop tasks that demand traversal of structured biological knowledge. These findings highlight the effectiveness of combining structured knowledge with advanced reasoning strategies for reliable and interpretable biomolecular reasoning.
Problem

Research questions and friction points this paper is trying to address.

Addresses logical inconsistencies in biomolecular reasoning with language models
Integrates knowledge graphs to capture long mechanistic dependencies
Overcomes limitations of existing biomolecular reasoning benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates knowledge graphs with multi-hop reasoning chains
Uses supervised fine-tuning and reinforcement learning
Constructs mechanistic chains via guided traversal and pruning
๐Ÿ”Ž Similar Papers
No similar papers found.
T
Tianwen Lyu
The Polytechnic Institute, Zhejiang University
Xiang Zhuang
Xiang Zhuang
Ph.D. student, Zhejiang University
K
Keyan Ding
ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University
X
Xinzhe Cao
University of Oxford
Lei Liang
Lei Liang
Ant Group
Knowledge GraphAI
W
Wei Zhao
University of Aberdeen
Q
Qiang Zhang
ZJU-UIUC Institute, Zhejiang University
H
Huajun Chen
College of Computer Science and Technology, Zhejiang University