🤖 AI Summary
Existing LLM evaluation methods suffer from inconsistency, bias, and opaque automated metrics. To address these issues, we propose an interpretable, adversarial multi-agent evaluation framework: multiple LLM agents assume “advocate” roles and engage in structured debates under a judge-jury mechanism, enabling dynamic assessment through iterative argumentation and adjudication. Our key contributions include: (1) introducing the first evaluation paradigm wherein LLMs serve as *debate-capable advocates*; (2) designing a theory-driven probabilistic error attenuation model to quantify and mitigate evaluation bias; and (3) integrating role-based prompting, formal debate protocols, and self-supervised feedback. Experiments demonstrate that our multi-advocate architecture significantly reduces evaluation error, enhances robustness, and improves cross-task consistency—establishing a new benchmark for trustworthy LLM evaluation.
📝 Abstract
This paper explores optimal architectures for evaluating the outputs of large language models (LLMs) using LLMs themselves. We propose a novel framework that interprets LLMs as advocates within an ensemble of interacting agents, allowing them to defend their answers and reach conclusions through a judge and jury system. This approach offers a more dynamic and comprehensive evaluation process compared to traditional human-based assessments or automated metrics. We discuss the motivation behind this framework, its key components, and comparative advantages. We also present a probabilistic model to evaluate the error reduction achieved by iterative advocate systems. Finally, we outline experiments to validate the effectiveness of multi-advocate architectures and discuss future research directions.