🤖 AI Summary
This study addresses the inefficiency of self-explanations—such as chain-of-thought rationales—generated by large language models in multi-step question answering, where explanations are often excessively verbose. The work formalizes the trade-off between explanatory sufficiency and conciseness through the lens of the information bottleneck principle, treating self-explanations as compressed representations that retain only task-relevant information. By introducing a length-constrained mechanism, the authors systematically evaluate how varying degrees of compression affect answer accuracy on the ARC Challenge dataset. A cross-lingual evaluation framework spanning English and Persian is developed to assess the generality of this trade-off. Results demonstrate that moderate compression substantially reduces explanation length without compromising performance, whereas excessive compression degrades accuracy, thereby establishing both theoretical and empirical foundations for efficient, interpretable reasoning.
📝 Abstract
Large Language Models increasingly rely on self-explanations, such as chain of thought reasoning, to improve performance on multi step question answering. While these explanations enhance accuracy, they are often verbose and costly to generate, raising the question of how much explanation is truly necessary. In this paper, we examine the trade-off between sufficiency, defined as the ability of an explanation to justify the correct answer, and conciseness, defined as the reduction in explanation length. Building on the information bottleneck principle, we conceptualize explanations as compressed representations that retain only the information essential for producing correct answers.To operationalize this view, we introduce an evaluation pipeline that constrains explanation length and assesses sufficiency using multiple language models on the ARC Challenge dataset. To broaden the scope, we conduct experiments in both English, using the original dataset, and Persian, as a resource-limited language through translation. Our experiments show that more concise explanations often remain sufficient, preserving accuracy while substantially reducing explanation length, whereas excessive compression leads to performance degradation.