An Analysis of Concept Bottleneck Models: Measuring, Understanding, and Mitigating the Impact of Noisy Annotations

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Concept Bottleneck Models (CBMs) rely on human-annotated concepts, yet noisy concept labels jointly degrade predictive accuracy, interpretability, and intervention efficacy—mechanisms underlying this degradation remain poorly understood. Method: We introduce the first “noise sensitivity” metric to identify fragile concept subsets; propose a two-stage robust framework—Sharpness-Aware Minimization (SAM) during training to enhance optimization stability, and entropy-guided active correction of high-uncertainty concepts during inference. We theoretically establish a strong correlation between prediction entropy and concept fragility, substantiated by concept-level intervention analysis and interpretability evaluation. Results: Experiments demonstrate substantial improvements in both classification accuracy and concept fidelity under label noise; correcting only the identified fragile concepts recovers over 90% of the performance loss, validating the framework’s efficiency and robustness.

Technology Category

Application Category

📝 Abstract
Concept bottleneck models (CBMs) ensure interpretability by decomposing predictions into human interpretable concepts. Yet the annotations used for training CBMs that enable this transparency are often noisy, and the impact of such corruption is not well understood. In this study, we present the first systematic study of noise in CBMs and show that even moderate corruption simultaneously impairs prediction performance, interpretability, and the intervention effectiveness. Our analysis identifies a susceptible subset of concepts whose accuracy declines far more than the average gap between noisy and clean supervision and whose corruption accounts for most performance loss. To mitigate this vulnerability we propose a two-stage framework. During training, sharpness-aware minimization stabilizes the learning of noise-sensitive concepts. During inference, where clean labels are unavailable, we rank concepts by predictive entropy and correct only the most uncertain ones, using uncertainty as a proxy for susceptibility. Theoretical analysis and extensive ablations elucidate why sharpness-aware training confers robustness and why uncertainty reliably identifies susceptible concepts, providing a principled basis that preserves both interpretability and resilience in the presence of noise.
Problem

Research questions and friction points this paper is trying to address.

Measure impact of noisy annotations on CBMs' performance and interpretability
Identify susceptible concepts most affected by annotation noise
Propose framework to mitigate noise impact using sharpness-aware training and uncertainty correction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sharpness-aware minimization stabilizes noise-sensitive concepts
Predictive entropy ranks concepts for correction
Two-stage framework enhances interpretability and resilience
🔎 Similar Papers
No similar papers found.
S
Seonghwan Park
POSTECH
J
Jueun Mun
POSTECH
D
Donghyun Oh
POSTECH
Namhoon Lee
Namhoon Lee
POSTECH
machine learning