π€ AI Summary
This study addresses the limited effectiveness of existing explainable artificial intelligence (XAI) approaches in improving usersβ objective performance and the lack of empirical evidence on how conversational XAI influences prediction accuracy, model understanding, and error identification. Through a controlled between-subjects experiment employing intrinsically interpretable models, the authors evaluate the impact of conversational versus question-answering XAI on user decision-making, enabling participants to detect and correct systematic model errors. Preliminary results (N=42) indicate that both XAI modalities significantly enhance user performance compared to baseline conditions, though no significant difference emerges between them; notably, user engagement remains low, suggesting directions for refined intervention strategies. This work contributes an experimental paradigm that empowers users to surpass model performance and provides empirical validation of conversational XAIβs efficacy.
π Abstract
Explainable AI (XAI) techniques aim to provide insights into predictive models and enhance user performance, yet they often fall short of these expectations. Conversational XAI assistants promise to overcome such limitations, but empirical evidence on their impact on objective performance measures remains limited. We propose an experimental design for evaluating explanation assistance through prediction accuracy, model understanding, and error identification. Using an explainable-by-design prediction model, we create conditions where users can outperform the model by identifying and compensating for systematic errors. We compare conversational assistance against Q&A-based assistance to assess which better supports users in working with model explanations. Preliminary results from testing our experimental design show that participants (N=42) in both treatments significantly outperformed the model but reveal no performance differences between assistance types and modest engagement overall. These findings inform refinements for our planned full study, including enhanced engagement interventions and investigation of the mechanisms driving improved predictions.