🤖 AI Summary
This work addresses factual inconsistency in large language models during knowledge-intensive question answering, stemming from hallucination and insufficient coverage of long-tail knowledge. To mitigate these issues, the authors propose an adaptive multi-agent trajectory alignment framework that coordinates six specialized agents to perform structured actions. The approach formulates collaboration among agents and external tools as a trajectory preference alignment problem, enabling dynamic knowledge integration and interpretable reasoning. It introduces intra-trajectory preference learning to prioritize critical agents and devises a dependency-aware direct preference optimization mechanism that explicitly models inter-agent tool-calling dependencies. Experimental results demonstrate that the method significantly outperforms existing approaches across five knowledge-intensive QA benchmarks while substantially reducing token consumption during inference.
📝 Abstract
Despite substantial advances in large language models (LLMs), generating factually consistent responses for knowledge-intensive question answering remains challenging. These difficulties are primarily due to hallucinations and the limitations of LLMs in bridging long-tail knowledge gaps. To address this, we propose AMATA, an Adaptive Multi-Agent Trajectory Alignment framework that dynamically integrates external knowledge to improve response interpretability and factual grounding. Our architecture leverages six specialized agents that collaboratively perform structured actions for complex question reasoning. We formalize multi-agent collaboration with external tools as a trajectory preference alignment problem, incorporating question-aware agent customization and inter-agent preference harmonization. AMATA introduces two principal innovations: (1) Intra-Trajectory Preference Learning, which learns objective-oriented preferences to prioritize critical agents, and (2) Inter-Agent Dependency Learning, which captures cross-agent tool dependencies through a novel dependency-aware direct preference optimization technique. Empirical results show that AMATA consistently outperforms baseline approaches, knowledge-augmented frameworks, and LLM-based trajectory systems on five established knowledge-intensive QA benchmarks. Further analysis demonstrates the efficiency of our method in reducing token consumption.