Metonymy in vision models undermines attention-based interpretability

πŸ“… 2026-05-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

211K/year
πŸ€– AI Summary
This work identifies and quantifies a pervasive phenomenon in modern vision transformers termed β€œvisual metonymy,” wherein local part representations inadvertently encode holistic object information, thereby undermining the locality assumptions inherent in attention-based interpretability methods. To address this issue, the authors propose a two-stage disentangled representation approach that architecturally separates part feature extraction from object-level contextual modeling, effectively suppressing internal information leakage. Experimental results demonstrate that this method substantially reduces object-level information leakage within part representations and significantly improves performance in attribute-driven part discovery, thereby validating the critical role of disentangled architectural design in enhancing model interpretability.
πŸ“ Abstract
Part-based reasoning is a classical strategy to make a computer vision model directly focus on the object parts that are relevant to the downstream task. In the context of deep learning, this also serves to improve by-design interpretability, often by using part-centric attention mechanisms on top of a latent image representation provided by a standard, black-box model. This approach is based on a locality assumption: that the latent representation of an object part encodes primarily information about the corresponding image region. In this work, we test this basic assumption, measuring intra-object leakage in vision models using part-based attribute annotations. Through a comprehensive experimental evaluation, we show that modern pretrained vision transformers violate the locality assumption and exhibit a strong intra-object leakage, in which each part encodes information from the whole object, a visual metonymy that compromises the faithfulness of attention-based interpretable-by-design methods for part-based reasoning, ultimately rendering them uninterpretable. In addition, we establish an upper bound using a two-stage approach that prevents leakage by design. We then show that this inherently disentangled feature extraction improves attribute-driven part discovery on a variety of tasks, confirming the practical impact of intra-object leakage. Our results uncover a neglected issue affecting the interpretability of part-based representations, such as those in CBMs relying on part-centric concepts, highlighting that two-stage approaches offer a promising way to mitigate it.
Problem

Research questions and friction points this paper is trying to address.

metonymy
intra-object leakage
attention-based interpretability
part-based reasoning
vision transformers
Innovation

Methods, ideas, or system contributions that make the work stand out.

intra-object leakage
visual metonymy
part-based reasoning
attention interpretability
two-stage disentanglement
πŸ”Ž Similar Papers
No similar papers found.