Stay Focused: Problem Drift in Multi-Agent Debate

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
“Topic drift” is a pervasive issue in multi-agent debate—where discussions progressively deviate from the initial task across rounds, severely undermining effectiveness in knowledge-intensive and reasoning tasks. Method: We formally define and systematically quantify topic drift across ten diverse task categories. We propose DRIFTJudge, a test-time drift detector, and DRIFTPolicy, a mitigation strategy that models feedback quality and response clarity; the latter leverages LLM-as-a-judge evaluation and human expert analysis for iterative policy refinement. Contribution/Results: Causal analysis identifies three primary drivers: lack of progress (35%), low-quality feedback (26%), and ambiguous phrasing (25%). DRIFTPolicy reduces drift incidence by 31%, significantly enhancing focus and reliability in multi-agent collaboration. Our work establishes the first rigorous framework for detecting, analyzing, and mitigating topic drift in multi-agent debate systems.

Technology Category

Application Category

📝 Abstract
Multi-agent debate - multiple instances of large language models discussing problems in turn-based interaction - has shown promise for solving knowledge and reasoning tasks. However, these methods show limitations, particularly when scaling them to longer reasoning chains. In this study, we unveil a new issue of multi-agent debate: discussions drift away from the initial problem over multiple turns. We define this phenomenon as problem drift and quantify its presence across ten tasks (i.e., three generative, three knowledge, three reasoning, and one instruction-following task). To identify the reasons for this issue, we perform a human study with eight experts on discussions suffering from problem drift, who find the most common issues are a lack of progress (35% of cases), low-quality feedback (26% of cases), and a lack of clarity (25% of cases). To systematically address the issue of problem drift, we propose DRIFTJudge, a method based on LLM-as-a-judge, to detect problem drift at test-time. We further propose DRIFTPolicy, a method to mitigate 31% of problem drift cases. Our study can be seen as a first step to understanding a key limitation of multi-agent debate, highlighting pathways for improving their effectiveness in the future.
Problem

Research questions and friction points this paper is trying to address.

Identifies problem drift in multi-agent debate.
Proposes DRIFTJudge to detect problem drift.
Introduces DRIFTPolicy to reduce problem drift.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-as-a-judge for drift detection
DRIFTJudge identifies problem drift
DRIFTPolicy reduces drift by 31%
🔎 Similar Papers
No similar papers found.