The Argument is the Explanation: Structured Argumentation for Trust in Agents

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI explanation methods rely on mechanistic transparency, failing to provide verifiable and auditable reasoning grounds—thereby hindering trustworthy deployment. Method: We propose a trustworthiness-oriented explanation framework grounded in structured argumentation, modeling support/attack relations via Bipolar Assumption-Based Argumentation (Bipolar ABA). This enables hallucination detection and test-time iterative refinement without model retraining. Integrating a What-If multi-agent risk assessment module, we automatically construct argument graphs from LLM-generated text and perform relation classification and formal verification using the Bipolar ABA Python package. Contribution: Our approach achieves 94.44 macro-F1 (+5.7) on the AAEC dataset and 0.81 macro-F1 (+0.07) on the Argumentative MicroTexts task. Crucially, it establishes verifiable argument chains as a foundational infrastructure for trustworthy AI explanation—the first such formulation in the literature.

Technology Category

Application Category

📝 Abstract
Humans are black boxes -- we cannot observe their neural processes, yet society functions by evaluating verifiable arguments. AI explainability should follow this principle: stakeholders need verifiable reasoning chains, not mechanistic transparency. We propose using structured argumentation to provide a level of explanation and verification neither interpretability nor LLM-generated explanation is able to offer. Our pipeline achieves state-of-the-art 94.44 macro F1 on the AAEC published train/test split (5.7 points above prior work) and $0.81$ macro F1, $sim$0.07 above previous published results with comparable data setups, for Argumentative MicroTexts relation classification, converting LLM text into argument graphs and enabling verification at each inferential step. We demonstrate this idea on multi-agent risk assessment using the Structured What-If Technique, where specialized agents collaborate transparently to carry out risk assessment otherwise achieved by humans alone. Using Bipolar Assumption-Based Argumentation, we capture support/attack relationships, thereby enabling automatic hallucination detection via fact nodes attacking arguments. We also provide a verification mechanism that enables iterative refinement through test-time feedback without retraining. For easy deployment, we provide a Docker container for the fine-tuned AMT model, and the rest of the code with the Bipolar ABA Python package on GitHub.
Problem

Research questions and friction points this paper is trying to address.

Providing verifiable reasoning chains for AI explainability
Converting LLM text into structured argument graphs
Enabling transparent multi-agent risk assessment collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using structured argumentation for verifiable reasoning chains
Converting LLM text into verifiable argument graphs
Enabling automatic hallucination detection via argument relationships
🔎 Similar Papers
No similar papers found.
E
Ege Cakar
Department of Engineering, University of Cambridge, Cambridge, United Kingdom; Department of Physics, Harvard University, Cambridge, Massachusetts, USA
Per Ola Kristensson
Per Ola Kristensson
Professor of Interactive Systems Engineering, Department of Engineering, University of Cambridge
Human-Computer InteractionIntelligent Interactive SystemsSpeech and Language ProcessingVirtual and Augmented Reality