MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address factual inconsistency in long-text generation, this paper proposes MAMM-Refine—a multi-agent and multi-model collaborative iterative refinement framework that decouples critique and correction into reranking-driven subtasks. Innovatively, factual correction is formulated as sentence-level reranking, integrating heterogeneous large language models (e.g., Llama, Qwen, Phi) to jointly perform error detection, unfaithful sentence identification, and faithful reconstruction in a closed-loop optimization process. Evaluated on three abstractive summarization datasets and a long-form question answering task, MAMM-Refine achieves an average 12.7% improvement in FaithD score over strong baselines—including single-model and existing multi-agent approaches—demonstrating both effectiveness and cross-task generalizability.

Technology Category

Application Category

📝 Abstract
Multi-agent collaboration among models has shown promise in reasoning tasks but is underexplored in long-form generation tasks like summarization and question-answering. We extend multi-agent multi-model reasoning to generation, specifically to improving faithfulness through refinement, i.e., revising model-generated outputs to remove factual inconsistencies. We investigate how iterative collaboration among multiple instances and types of large language models (LLMs) enhances subtasks in the refinement process, such as error detection, critiquing unfaithful sentences, and making corrections based on critiques. We design intrinsic evaluations for each subtask, with our findings indicating that both multi-agent (multiple instances) and multi-model (diverse LLM types) approaches benefit error detection and critiquing. Additionally, reframing critiquing and refinement as reranking rather than generation tasks improves multi-agent performance. We consolidate these insights into a final"recipe"called Multi-Agent Multi-Model Refinement (MAMM-Refine), where multi-agent and multi-model collaboration significantly boosts performance on three summarization datasets as well as on long-form question answering, demonstrating the effectiveness and generalizability of our recipe.
Problem

Research questions and friction points this paper is trying to address.

Improving faithfulness in long-form generation tasks
Using multi-agent collaboration for error detection and correction
Enhancing summarization and question-answering through iterative refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent collaboration improves generation faithfulness.
Iterative LLM collaboration enhances error detection and critiquing.
Reframing refinement as reranking boosts multi-agent performance.
🔎 Similar Papers
No similar papers found.