🤖 AI Summary
Existing concept bottleneck models are confined to a single semantic level, limiting their ability to emulate the human multi-granularity cognitive process and thereby constraining both interpretability and representational capacity. This work proposes the first hierarchical concept bottleneck model that operates without requiring relational concept annotations. By employing a dual-branch classification head architecture and a gradient-driven visual consistency loss, the model jointly optimizes prediction accuracy and concept-based explanations across multiple levels of abstraction. The approach achieves semantic alignment between concepts and predictions under end-to-end training. Experiments demonstrate that the model outperforms current sparse concept bottleneck models in classification performance on standard benchmarks, while human evaluations confirm that its generated explanations are more hierarchical, accurate, and comprehensible.
📝 Abstract
Concept Bottleneck Models (CBMs) introduce interpretability to black-box deep learning models by predicting labels through human-understandable concepts. However, unlike humans, who identify objects at different levels of abstraction using both general and specific features, existing CBMs operate at a single semantic level in both concept and label space. We propose HIL-CBM, a Hierarchical Interpretable Label-Free Concept Bottleneck Model that extends CBMs into a hierarchical framework to enhance interpretability by more closely mirroring the human cognitive process. HIL-CBM enables classification and explanation across multiple semantic levels without requiring relational concept annotations. HIL-CBM aligns the abstraction level of concept-based explanations with that of model predictions, progressing from abstract to concrete. This is achieved by (i) introducing a gradient-based visual consistency loss that encourages abstraction layers to focus on similar spatial regions, and (ii) training dual classification heads, each operating on feature concepts at different abstraction levels. Experiments on benchmark datasets demonstrate that HIL-CBM outperforms state-of-the-art sparse CBMs in classification accuracy. Human evaluations further show that HIL-CBM provides more interpretable and accurate explanations, while maintaining a hierarchical and label-free approach to feature concepts.