🤖 AI Summary
This work identifies a fundamental flaw in Grad-ECLIP: its explanation path is essentially equivalent to existing attention mechanisms yet fails to faithfully reflect the behavior of the original model. To address this, we propose Attention-ECLIP—an equivalent but more concise alternative—and establish two core principles for model interpretability: faithfulness and consistency. Through formal derivation, attention analysis, and comprehensive comparative experiments, we demonstrate that Grad-ECLIP is not a novel method and yields unreliable explanations. In contrast, Attention-ECLIP achieves computational efficiency while generating explanations that align consistently with the model’s actual decision-making process.
📝 Abstract
Grad-ECLIP is published at ICML 2024 and represents a new Transformer interpretation technical route (intermediate features-based). First, this paper demonstrates that the intermediate features-based technical route is not a novel one. Based on the existing attention-based route, we have developed Attention-ECLIP, which is completely equivalent to Grad-ECLIP but with simpler computation. Both through formal derivation and experimental validation, we prove that the intermediate feature-based route represented by Grad-ECLIP is actually an equivalent variant of the attention-based route. Next, this paper demonstrates that the Grad-ECLIP method is flawed. The model interpretation results obtained by Grad-ECLIP are not those of the original model, and the interpretation results are misaligned with the model's performance. We analyze the causes of Grad-ECLIP's flaws and propose, or rather, explicitly emphasize two fundamental principles that model interpretation should adhere to in order to avoid similar errors.