π€ AI Summary
This work addresses the lack of systematic evaluation of sparsity-driven interpretability methods in current vision-language models (VLMs). Focusing on concept bottleneck models, it introduces a novel metric called βclarityβ to quantify the trade-off between downstream task performance and the sparsity and precision of concept representations. The study establishes the first interpretability evaluation framework grounded in real human-annotated concept labels. By integrating VLMs with attribute predictors and employing ββ, ββ, and Bernoulli-based sparsification strategies, the experiments reveal that, under comparable performance, different sparsification approaches exhibit significant differences in the balance between flexibility and interpretability. These findings underscore the critical influence of modeling choices on the quality of learned concept representations.
π Abstract
The widespread adoption of Vision-Language Models (VLMs) across fields has amplified concerns about model interpretability. Distressingly, these models are often treated as black-boxes, with limited or non-existent investigation of their decision making process. Despite numerous post- and ante-hoc interepretability methods, systematic and objective evaluation of the learned representations remains limited, particularly for sparsity-aware methods that are increasingly considered to"induce interpretability". In this work, we focus on Concept Bottleneck Models and investigate how different modeling decisions affect the emerging representations. We introduce the notion of clarity, a measure, capturing the interplay between the downstream performance and the sparsity and precision of the concept representation, while proposing an interpretability assessment framework using datasets with ground truth concept annotations. We consider both VLM- and attribute predictor-based CBMs, and three different sparsity-inducing strategies: per example $\ell_1, \ell_0$ and Bernoulli-based formulations. Our experiments reveal a critical trade-off between flexibility and interpretability, under which a given method can exhibit markedly different behaviors even at comparable performance levels. The code will be made publicly available upon publication.