Leakage and Interpretability in Concept-Based Models

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

Concept bottleneck models (CBMs) suffer from pervasive information leakage—specifically, concept–task leakage (CTL), where intermediate concepts inadvertently encode task-relevant information, and inter-concept leakage (ICL), where redundant concepts exhibit spurious correlations—undermining interpretability and intervention robustness. Method: We propose the first information-theoretic, dual-metric framework to quantify CTL and ICL, formally defining and empirically validating both phenomena. We further derive theoretical links between leakage and intervention robustness, yielding actionable modeling guidelines for leakage mitigation. Contribution/Results: Our analysis reveals that CTL and ICL are prevalent across mainstream CBMs and persist irrespective of hyperparameter choices. Experiments demonstrate that our framework significantly outperforms existing methods in predicting model responses to concept interventions. It establishes a novel evaluation paradigm for trustworthy, interpretable AI and provides practical, theory-grounded optimization strategies for building robust CBMs.

Technology Category

Application Category

📝 Abstract

Concept Bottleneck Models aim to improve interpretability by predicting high-level intermediate concepts, representing a promising approach for deployment in high-risk scenarios. However, they are known to suffer from information leakage, whereby models exploit unintended information encoded within the learned concepts. We introduce an information-theoretic framework to rigorously characterise and quantify leakage, and define two complementary measures: the concepts-task leakage (CTL) and interconcept leakage (ICL) scores. We show that these measures are strongly predictive of model behaviour under interventions and outperform existing alternatives in robustness and reliability. Using this framework, we identify the primary causes of leakage and provide strong evidence that Concept Embedding Models exhibit substantial leakage regardless of the hyperparameters choice. Finally, we propose practical guidelines for designing concept-based models to reduce leakage and ensure interpretability.

Problem

Research questions and friction points this paper is trying to address.

Quantify information leakage in Concept Bottleneck Models

Identify causes of leakage in Concept Embedding Models

Propose guidelines to reduce leakage and ensure interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Information-theoretic framework for leakage quantification

CTL and ICL scores predict model behavior

Guidelines to reduce leakage in concept models

🔎 Similar Papers

CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models