BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering

📅 2025-05-17

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

In multi-hop question answering, existing LLM-based methods overlook question-type heterogeneity, leading to misalignment between reasoning paths and task requirements. To address this, we propose a question-type-driven two-level multi-agent framework: (1) At the first level, a multi-role deliberation process generates operator-composition execution plans tailored to question types; (2) At the second level, a “fast-slow deliberation” mechanism dynamically validates the logical soundness and evolutionary consistency of reasoning chains. Our approach innovatively establishes an operatorized method selection paradigm, integrating role specialization with dual-speed deliberation to enable interpretable, self-correcting multi-hop reasoning. It synergistically combines prompt engineering, chain-of-thought prompting, iterative reasoning, and cross-speed verification. Extensive experiments demonstrate significant improvements over strong baselines across multiple benchmarks; notably, in complex scenarios, it achieves a 32.7% gain in performance per unit computational cost—balancing high accuracy with superior cost-efficiency.

Technology Category

Application Category

📝 Abstract

Multi-hop question answering (QA) involves finding multiple relevant passages and performing step-by-step reasoning to answer complex questions. Previous works on multi-hop QA employ specific methods from different modeling perspectives based on large language models (LLMs), regardless of the question types. In this paper, we first conduct an in-depth analysis of public multi-hop QA benchmarks, dividing the questions into four types and evaluating five types of cutting-edge methods for multi-hop QA: Chain-of-Thought (CoT), Single-step, Iterative-step, Sub-step, and Adaptive-step. We find that different types of multi-hop questions have varying degrees of sensitivity to different types of methods. Thus, we propose a Bi-levEL muLti-agEnt reasoning (BELLE) framework to address multi-hop QA by specifically focusing on the correspondence between question types and methods, where each type of method is regarded as an ''operator'' by prompting LLMs differently. The first level of BELLE includes multiple agents that debate to obtain an executive plan of combined ''operators'' to address the multi-hop QA task comprehensively. During the debate, in addition to the basic roles of affirmative debater, negative debater, and judge, at the second level, we further leverage fast and slow debaters to monitor whether changes in viewpoints are reasonable. Extensive experiments demonstrate that BELLE significantly outperforms strong baselines in various datasets. Additionally, the model consumption of BELLE is higher cost-effectiveness than that of single models in more complex multi-hop QA scenarios.

Problem

Research questions and friction points this paper is trying to address.

Classifies multi-hop questions into four types for targeted reasoning

Proposes BELLE framework to match question types with optimal methods

Uses multi-agent debate to dynamically select reasoning strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bi-level multi-agent framework for QA

Question-method correspondence optimization

Fast-slow debaters enhance reasoning

🔎 Similar Papers

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models