Don't Just Translate, Agitate: Using Large Language Models as Devil's Advocates for AI Explanations

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Current XAI research often relies on LLMs to directly translate feature-attribution outputs into natural language, improving readability but inadvertently reinforcing “explanatory hallucinations” and fostering unwarranted user trust. This work proposes a paradigm shift: transforming LLMs from passive translators into constructive “devil’s advocates” that actively interrogate the validity of XAI explanations—by generating alternative interpretations, exposing data biases, and revealing model uncertainty and failure boundaries. Methodologically, we integrate adversarial prompting, structured parsing of XAI outputs, multi-perspective explanation generation, and explicit uncertainty visualization. Experiments demonstrate that our framework significantly reduces users’ overreliance on AI explanations and enhances their sensitivity to model limitations. By promoting critical engagement with explanations, it establishes a novel foundation for trustworthy human-AI collaboration.

Technology Category

Application Category

📝 Abstract

This position paper highlights a growing trend in Explainable AI (XAI) research where Large Language Models (LLMs) are used to translate outputs from explainability techniques, like feature-attribution weights, into a natural language explanation. While this approach may improve accessibility or readability for users, recent findings suggest that translating into human-like explanations does not necessarily enhance user understanding and may instead lead to overreliance on AI systems. When LLMs summarize XAI outputs without surfacing model limitations, uncertainties, or inconsistencies, they risk reinforcing the illusion of interpretability rather than fostering meaningful transparency. We argue that - instead of merely translating XAI outputs - LLMs should serve as constructive agitators, or devil's advocates, whose role is to actively interrogate AI explanations by presenting alternative interpretations, potential biases, training data limitations, and cases where the model's reasoning may break down. In this role, LLMs can facilitate users in engaging critically with AI systems and generated explanations, with the potential to reduce overreliance caused by misinterpreted or specious explanations.

Problem

Research questions and friction points this paper is trying to address.

LLMs risk reinforcing interpretability illusions in XAI

Current LLM translations may increase AI overreliance

Need LLMs as devil's advocates to critique explanations

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs as devil's advocates for XAI

Interrogate AI explanations critically

Reduce overreliance on AI systems

🔎 Similar Papers

Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era