MALLM: Multi-Agent Large Language Models Framework

📅 2025-09-15
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
Existing Multi-Agent Debate (MAD) frameworks suffer from three key limitations: overemphasis on tool invocation, absence of built-in evaluation capabilities, and insufficient configurability of core components—including agent roles, response generation strategies, discussion paradigms, and decision protocols. To address these gaps, we propose a modular, configurable multi-agent large language model framework that enables fine-grained modeling and systematic evaluation of debate mechanisms. The framework supports over 144 composable configurations—spanning flexible role definitions, diverse response generators, discussion paradigms (e.g., relay, broadcast), and heterogeneous decision protocols—all controlled via standardized YAML specifications for reproducible experimentation. It integrates Hugging Face dataset loading, memory propagation across turns, and an end-to-end evaluation pipeline. Empirical results demonstrate substantially improved research efficiency and reproducibility in studying collective intelligence for complex reasoning tasks.

Technology Category

Application Category

📝 Abstract
Multi-agent debate (MAD) has demonstrated the ability to augment collective intelligence by scaling test-time compute and leveraging expertise. Current frameworks for multi-agent debate are often designed towards tool use, lack integrated evaluation, or provide limited configurability of agent personas, response generators, discussion paradigms, and decision protocols. We introduce MALLM (Multi-Agent Large Language Models), an open-source framework that enables systematic analysis of MAD components. MALLM offers more than 144 unique configurations of MAD, including (1) agent personas (e.g., Expert, Personality), (2) response generators (e.g., Critical, Reasoning), (3) discussion paradigms (e.g., Memory, Relay), and (4) decision protocols (e.g., Voting, Consensus). MALLM uses simple configuration files to define a debate. Furthermore, MALLM can load any textual Huggingface dataset (e.g., MMLU-Pro, WinoGrande) and provides an evaluation pipeline for easy comparison of MAD configurations. MALLM is tailored towards researchers and provides a window into the heart of multi-agent debate, facilitating the understanding of its components and their interplay.
Problem

Research questions and friction points this paper is trying to address.

Addressing limited configurability in multi-agent debate frameworks
Providing systematic analysis of multi-agent debate components
Enabling easy comparison of multi-agent debate configurations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source framework for systematic MAD analysis
Over 144 configurable multi-agent debate components
Integrated evaluation pipeline with Huggingface datasets
🔎 Similar Papers
No similar papers found.