DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Large language models (LLMs) suffer from factual inconsistency in intermediate queries and inefficient search paths during multi-step retrieval, leading to reasoning bias and redundant computation. Method: We propose a search agent framework integrating dynamic knowledge graphs with multi-objective reinforcement learning. The dynamic graph explicitly models entity relationships to enforce reasoning consistency, while a composite reward function jointly optimizes for answer accuracy, retrieval efficiency, and final answer quality—enabling fine-grained optimization of the search path. Results: Evaluated on six benchmark multi-hop question answering datasets, our approach achieves state-of-the-art performance using only small-scale LMs (≤7B parameters) and limited computational resources. It significantly outperforms same-sized baselines and matches the performance of cutting-edge large models, demonstrating strong generalization and cross-environment robustness.

Technology Category

Application Category

📝 Abstract

Multi-step agentic retrieval systems based on large language models (LLMs) have demonstrated remarkable performance in complex information search tasks. However, these systems still face significant challenges in practical applications, particularly in generating factually inconsistent intermediate queries and inefficient search trajectories, which can lead to reasoning deviations or redundant computations. To address these issues, we propose DynaSearcher, an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL). Specifically, our system leverages knowledge graphs as external structured knowledge to guide the search process by explicitly modeling entity relationships, thereby ensuring factual consistency in intermediate queries and mitigating biases from irrelevant information. Furthermore, we employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality. This framework promotes the generation of high-quality intermediate queries and comprehensive final answers, while discouraging unnecessary exploration and minimizing information omissions or redundancy. Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets, matching frontier LLMs while using only small-scale models and limited computational resources. Furthermore, our approach demonstrates strong generalization and robustness across diverse retrieval environments and larger-scale models, highlighting its broad applicability.

Problem

Research questions and friction points this paper is trying to address.

Addresses factual inconsistency in intermediate queries during search

Improves efficiency by reducing redundant search trajectories

Enhances retrieval accuracy and response quality via multi-reward RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic knowledge graphs guide search process

Multi-reward RL optimizes accuracy and efficiency

Small-scale models achieve high answer accuracy

🔎 Similar Papers

MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification