🤖 AI Summary
Existing AI explanation methods rely on mechanistic transparency, failing to provide verifiable and auditable reasoning grounds—thereby hindering trustworthy deployment.
Method: We propose a trustworthiness-oriented explanation framework grounded in structured argumentation, modeling support/attack relations via Bipolar Assumption-Based Argumentation (Bipolar ABA). This enables hallucination detection and test-time iterative refinement without model retraining. Integrating a What-If multi-agent risk assessment module, we automatically construct argument graphs from LLM-generated text and perform relation classification and formal verification using the Bipolar ABA Python package.
Contribution: Our approach achieves 94.44 macro-F1 (+5.7) on the AAEC dataset and 0.81 macro-F1 (+0.07) on the Argumentative MicroTexts task. Crucially, it establishes verifiable argument chains as a foundational infrastructure for trustworthy AI explanation—the first such formulation in the literature.
📝 Abstract
Humans are black boxes -- we cannot observe their neural processes, yet society functions by evaluating verifiable arguments. AI explainability should follow this principle: stakeholders need verifiable reasoning chains, not mechanistic transparency. We propose using structured argumentation to provide a level of explanation and verification neither interpretability nor LLM-generated explanation is able to offer. Our pipeline achieves state-of-the-art 94.44 macro F1 on the AAEC published train/test split (5.7 points above prior work) and $0.81$ macro F1, $sim$0.07 above previous published results with comparable data setups, for Argumentative MicroTexts relation classification, converting LLM text into argument graphs and enabling verification at each inferential step. We demonstrate this idea on multi-agent risk assessment using the Structured What-If Technique, where specialized agents collaborate transparently to carry out risk assessment otherwise achieved by humans alone. Using Bipolar Assumption-Based Argumentation, we capture support/attack relationships, thereby enabling automatic hallucination detection via fact nodes attacking arguments. We also provide a verification mechanism that enables iterative refinement through test-time feedback without retraining. For easy deployment, we provide a Docker container for the fine-tuned AMT model, and the rest of the code with the Bipolar ABA Python package on GitHub.