🤖 AI Summary
Large language models (LLMs) suffer from factual inconsistency in intermediate queries and inefficient search paths during multi-step retrieval, leading to reasoning bias and redundant computation.
Method: We propose a search agent framework integrating dynamic knowledge graphs with multi-objective reinforcement learning. The dynamic graph explicitly models entity relationships to enforce reasoning consistency, while a composite reward function jointly optimizes for answer accuracy, retrieval efficiency, and final answer quality—enabling fine-grained optimization of the search path.
Results: Evaluated on six benchmark multi-hop question answering datasets, our approach achieves state-of-the-art performance using only small-scale LMs (≤7B parameters) and limited computational resources. It significantly outperforms same-sized baselines and matches the performance of cutting-edge large models, demonstrating strong generalization and cross-environment robustness.
📝 Abstract
Multi-step agentic retrieval systems based on large language models (LLMs) have demonstrated remarkable performance in complex information search tasks. However, these systems still face significant challenges in practical applications, particularly in generating factually inconsistent intermediate queries and inefficient search trajectories, which can lead to reasoning deviations or redundant computations. To address these issues, we propose DynaSearcher, an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL). Specifically, our system leverages knowledge graphs as external structured knowledge to guide the search process by explicitly modeling entity relationships, thereby ensuring factual consistency in intermediate queries and mitigating biases from irrelevant information. Furthermore, we employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality. This framework promotes the generation of high-quality intermediate queries and comprehensive final answers, while discouraging unnecessary exploration and minimizing information omissions or redundancy. Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets, matching frontier LLMs while using only small-scale models and limited computational resources. Furthermore, our approach demonstrates strong generalization and robustness across diverse retrieval environments and larger-scale models, highlighting its broad applicability.