MArgE: Meshing Argumentative Evidence from Multiple Large Language Models for Justifiable Claim Verification

πŸ“… 2025-08-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address low credibility and poor verifiability in aggregating outputs from multiple large language models (LLMs), this paper proposes MArgEβ€”a framework grounded in computational argumentation theory. MArgE models heterogeneous evidence generated by diverse LLMs as structured, traceable argument trees, enabling explainable claim assessment. Its core innovation is the Argumentative LLM (ArgLLM), which integrates parameterized variants with explicit, structured reasoning mechanisms to support automated argument construction and fusion from multi-source inputs. Experiments across multiple open-source LLMs and GPT-4o-mini demonstrate that MArgE consistently outperforms both single-model baselines and existing unstructured multi-model debate approaches, achieving significant improvements in accuracy, verifiability, and transparency. By formalizing multi-model collaborative reasoning through auditable, argument-based semantics, MArgE establishes a novel, principled paradigm for trustworthy ensemble inference.

Technology Category

Application Category

πŸ“ Abstract
Leveraging outputs from multiple large language models (LLMs) is emerging as a method for harnessing their power across a wide range of tasks while mitigating their capacity for making errors, e.g., hallucinations. However, current approaches to combining insights from multiple LLMs often involve unstructured interactions (e.g., free debate), resulting in model generations that are not faithfully justifiable. In this work, we introduce MArgE, a novel framework to provide formal structure to the evidence from each LLM, in the form of a tree of extracted arguments, for the task of claim verification. We use a variant of Argumentative LLMs (ArgLLMs), i.e. LLMs driven by frameworks and semantics from the field of computational argumentation, to construct structured argument trees for given claims. This process creates an inspectable pathway from the initial arguments to the final claim verification decisions, providing a faithful justification thereof. We show experimentally that MArgE can significantly outperform single LLMs, including three open-source models (4B to 8B parameters), GPT-4o-mini and existing ArgLLMs, as well as prior methods for unstructured multi-LLM debates. We thus demonstrate the advantages of incorporating formal, argumentative reasoning mechanisms when combining multiple LLM outputs.
Problem

Research questions and friction points this paper is trying to address.

Structuring evidence from multiple LLMs for claim verification
Mitigating errors in LLM outputs through argumentative reasoning
Improving justifiability of claim verification decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured argument trees from multiple LLMs
Argumentative LLMs for claim verification
Formal reasoning to justify LLM outputs
πŸ”Ž Similar Papers
No similar papers found.
M
Ming Pok Ng
Department of Computing, Imperial College London, UK
Junqi Jiang
Junqi Jiang
PhD Candidate, Imperial College London
Trustworthy AIExplainable AIInterpretability
Gabriel Freedman
Gabriel Freedman
PhD Candidate, Imperial College London
argumentationuncertaintyimplicit knowledge
A
Antonio Rago
Department of Computing, Imperial College London, UK; Department of Informatics, King’s College London, UK
Francesca Toni
Francesca Toni
Imperial College London
Artificial Intelligence