🤖 AI Summary
Concept Bottleneck Models (CBMs) rely on human-annotated concepts, yet noisy concept labels jointly degrade predictive accuracy, interpretability, and intervention efficacy—mechanisms underlying this degradation remain poorly understood.
Method: We introduce the first “noise sensitivity” metric to identify fragile concept subsets; propose a two-stage robust framework—Sharpness-Aware Minimization (SAM) during training to enhance optimization stability, and entropy-guided active correction of high-uncertainty concepts during inference. We theoretically establish a strong correlation between prediction entropy and concept fragility, substantiated by concept-level intervention analysis and interpretability evaluation.
Results: Experiments demonstrate substantial improvements in both classification accuracy and concept fidelity under label noise; correcting only the identified fragile concepts recovers over 90% of the performance loss, validating the framework’s efficiency and robustness.
📝 Abstract
Concept bottleneck models (CBMs) ensure interpretability by decomposing predictions into human interpretable concepts. Yet the annotations used for training CBMs that enable this transparency are often noisy, and the impact of such corruption is not well understood. In this study, we present the first systematic study of noise in CBMs and show that even moderate corruption simultaneously impairs prediction performance, interpretability, and the intervention effectiveness. Our analysis identifies a susceptible subset of concepts whose accuracy declines far more than the average gap between noisy and clean supervision and whose corruption accounts for most performance loss. To mitigate this vulnerability we propose a two-stage framework. During training, sharpness-aware minimization stabilizes the learning of noise-sensitive concepts. During inference, where clean labels are unavailable, we rank concepts by predictive entropy and correct only the most uncertain ones, using uncertainty as a proxy for susceptibility. Theoretical analysis and extensive ablations elucidate why sharpness-aware training confers robustness and why uncertainty reliably identifies susceptible concepts, providing a principled basis that preserves both interpretability and resilience in the presence of noise.