🤖 AI Summary
To address the opacity of concept-to-label mappings in Concept Bottleneck Models (CBMs) and the persistent trade-off between interpretability and predictive performance, this paper proposes a binary distillation-based method that enhances CBMs with interpretable decision trees. Specifically, it introduces Fast Interpretable Greedy Sum-Trees (FIGS) into the CBM framework for the first time, distilling the opaque black-box concept-to-label mapping into a compact, human-readable binary tree. This tree enables adaptive, test-time intervention on salient concepts. Experiments across four benchmark datasets demonstrate that the method preserves high predictive accuracy while achieving high-fidelity attribution of concept interactions. Moreover, it significantly improves model performance under minimal human intervention—e.g., editing just a few concept values—thereby effectively supporting human-in-the-loop decision-making in real-world applications.
📝 Abstract
Concept bottleneck models~(CBM) aim to improve model interpretability by predicting human level ``concepts"in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-Trees~(FIGS) to obtain Binary Distillation~(BD). This new method, called FIGS-BD, distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model, while mimicking the competitive prediction performance of the CBM teacher. FIGS-BD can be used in downstream tasks to explain and decompose CBM predictions into interpretable binary-concept-interaction attributions and guide adaptive test-time intervention. Across $4$ datasets, we demonstrate that adaptive test-time intervention identifies key concepts that significantly improve performance for realistic human-in-the-loop settings that allow for limited concept interventions.