WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Multi-agent debate (MAD) remains underexplored for multimodal vision-language reasoning tasks. Method: This paper proposes WISE, the first MAD framework extended to multimodal settings, featuring a dual-role collaboration mechanism between Solvers and Reflectors, a two-stage debate protocol, and a weighted natural-language feedback structure. It employs heterogeneous multi-model ensembling and enhances the Dawid-Skene algorithm for interpretable, cross-round result aggregation. Contribution/Results: On multiple vision-language reasoning benchmarks, WISE outperforms state-of-the-art MAD methods by 2–7% in accuracy, significantly improving both reasoning robustness and interpretability. By unifying multimodal perception with structured agent-level deliberation, WISE establishes a novel paradigm for multimodal multi-agent collaborative reasoning.

Technology Category

Application Category

📝 Abstract

Recent large language models (LLMs) are trained on diverse corpora and tasks, leading them to develop complementary strengths. Multi-agent debate (MAD) has emerged as a popular way to leverage these strengths for robust reasoning, though it has mostly been applied to language-only tasks, leaving its efficacy on multimodal problems underexplored. In this paper, we study MAD for solving vision-and-language reasoning problems. Our setup enables generalizing the debate protocol with heterogeneous experts that possess single- and multi-modal capabilities. To this end, we present Weighted Iterative Society-of-Experts (WISE), a generalized and modular MAD framework that partitions the agents into Solvers, that generate solutions, and Reflectors, that verify correctness, assign weights, and provide natural language feedback. To aggregate the agents'solutions across debate rounds, while accounting for variance in their responses and the feedback weights, we present a modified Dawid-Skene algorithm for post-processing that integrates our two-stage debate model. We evaluate WISE on SMART-840, VisualPuzzles, EvoChart-QA, and a new SMART-840++ dataset with programmatically generated problem instances of controlled difficulty. Our results show that WISE consistently improves accuracy by 2-7% over the state-of-the-art MAD setups and aggregation methods across diverse multimodal tasks and LLM configurations.

Problem

Research questions and friction points this paper is trying to address.

Extends multi-agent debate to multimodal vision-language reasoning tasks

Proposes a weighted iterative framework with heterogeneous expert agents

Improves accuracy over existing methods on diverse multimodal benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weighted iterative society-of-experts framework

Modified Dawid-Skene algorithm for solution aggregation

Two-stage debate with solvers and reflectors

🔎 Similar Papers

Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates