🤖 AI Summary
This study systematically investigates, for the first time, the faithfulness of chain-of-thought (CoT) explanations in large audio-language models (LALMs)—i.e., whether CoT reasoning genuinely reflects the model’s internal decision process. To assess faithfulness under controlled perturbations, we design four intervention types—semantic-preserving rewriting, filler-token injection, premature answer insertion, and erroneous guidance—and evaluate them on two complex multimodal benchmarks: SAKURA and MMAR. Results demonstrate that LALMs exhibit significantly higher reasoning-path-to-decision consistency than text-only large language models, maintaining robust faithfulness across all interventions (average faithfulness: 86.3%). This work reveals an audio-modality-induced enhancement effect on interpretability and introduces the first CoT faithfulness evaluation framework tailored to LALMs. It provides critical empirical evidence and methodological foundations for developing trustworthy multimodal AI systems.
📝 Abstract
Faithfulness measures whether chain-of-thought (CoT) representations accurately reflect a model's decision process and can be used as reliable explanations. Prior work has shown that CoTs from text-based LLMs are often unfaithful. This question has not been explored for large audio-language models (LALMs), where faithfulness is critical for safety-sensitive applications. Reasoning in LALMs is also more challenging, as models must first extract relevant clues from audio before reasoning over them. In this paper, we investigate the faithfulness of CoTs produced by several LALMs by applying targeted interventions, including paraphrasing, filler token injection, early answering, and introducing mistakes, on two challenging reasoning datasets: SAKURA and MMAR. After going through the aforementioned interventions across several datasets and tasks, our experiments suggest that, LALMs generally produce CoTs that appear to be faithful to their underlying decision processes.