🤖 AI Summary
This study addresses the limited verifiability of natural language explanations generated by large language models, particularly in programming education where claims often lack grounding in reliable evidence. To bridge this gap, the work introduces “illocutionary macro-planning,” a novel framework derived from Achinstein’s theory of speech-act-based explanation, operationalized as Chain-of-Illocution (CoI) prompting. CoI reformulates user queries into implicit explanatory questions to guide retrieval-augmented generation (RAG), integrating textbook-sourced evidence retrieval with source fidelity evaluation. Evaluated on 90 Stack Overflow questions, the approach improves source fidelity by up to 63% across multiple models. User studies further indicate consistently high perceived explanation quality, relevance, correctness, and overall satisfaction.
📝 Abstract
Natural language explanations produced by large language models (LLMs) are often persuasive, but not necessarily scrutable: users cannot easily verify whether the claims in an explanation are supported by evidence. In XAI, this motivates a focus on faithfulness and traceability, i.e., the extent to which an explanation's claims can be grounded in, and traced back to, an explicit source. We study these desiderata in retrieval-augmented generation (RAG) for programming education, where textbooks provide authoritative evidence. We benchmark six LLMs on 90 Stack Overflow questions grounded in three programming textbooks and quantify source faithfulness via source adherence metrics. We find that non Retrieval-Augmented Generation (RAG) models have median source adherence of 0%, while baseline RAG systems still exhibit low median adherence (22-40%, depending on the model). Motivated by Achinstein's illocutionary theory of explanation, we introduce illocutionary macro-planning as a descriptive design principle for source-faithful explanations and instantiate it with chain-of-illocution prompting (CoI), which expands a query into implicit explanatory questions that drive retrieval. Across models, CoI yields statistically significant gains (up to 63%) in source adherence, although absolute adherence remains moderate and the gains are weak or non-significant for some models. A user study with 165 retained participants (220 recruited) indicates that these gains do not harm satisfaction, relevance, or perceived correctness.