🤖 AI Summary
This work challenges the practical value of chain-of-thought (CoT) reasoning for human-centered explainability in multi-LLM collaborative agent pipelines. Contrary to the prevailing paradigm equating CoT with “explainability” by default, the study designs a perception-task-guided agentic pipeline requiring minimal human intervention. It employs mixed-method evaluation—quantitative (task success rate, reasoning consistency) and qualitative (user cognitive tracing, explanation utility interviews)—to empirically demonstrate, for the first time, that CoT frequently produces “explanations without explanatory power”: they neither improve output quality nor enhance user understanding or goal attainment. The core contribution is twofold: (1) a critical challenge to the dominant assumption that CoT inherently confers explainability; and (2) the proposal of a novel evaluation framework centered on *goal achievement* and *user action support*, shifting explainable AI from formalistic justification toward outcome-oriented efficacy.
📝 Abstract
Agentic pipelines present novel challenges and opportunities for human-centered explainability. The HCXAI community is still grappling with how best to make the inner workings of LLMs transparent in actionable ways. Agentic pipelines consist of multiple LLMs working in cooperation with minimal human control. In this research paper, we present early findings from an agentic pipeline implementation of a perceptive task guidance system. Through quantitative and qualitative analysis, we analyze how Chain-of-Thought (CoT) reasoning, a common vehicle for explainability in LLMs, operates within agentic pipelines. We demonstrate that CoT reasoning alone does not lead to better outputs, nor does it offer explainability, as it tends to produce explanations without explainability, in that they do not improve the ability of end users to better understand systems or achieve their goals.