Enhancing CBMs Through Binary Distillation with Applications to Test-Time Intervention

📅 2025-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the opacity of concept-to-label mappings in Concept Bottleneck Models (CBMs) and the persistent trade-off between interpretability and predictive performance, this paper proposes a binary distillation-based method that enhances CBMs with interpretable decision trees. Specifically, it introduces Fast Interpretable Greedy Sum-Trees (FIGS) into the CBM framework for the first time, distilling the opaque black-box concept-to-label mapping into a compact, human-readable binary tree. This tree enables adaptive, test-time intervention on salient concepts. Experiments across four benchmark datasets demonstrate that the method preserves high predictive accuracy while achieving high-fidelity attribution of concept interactions. Moreover, it significantly improves model performance under minimal human intervention—e.g., editing just a few concept values—thereby effectively supporting human-in-the-loop decision-making in real-world applications.

Technology Category

Application Category

📝 Abstract
Concept bottleneck models~(CBM) aim to improve model interpretability by predicting human level ``concepts"in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-Trees~(FIGS) to obtain Binary Distillation~(BD). This new method, called FIGS-BD, distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model, while mimicking the competitive prediction performance of the CBM teacher. FIGS-BD can be used in downstream tasks to explain and decompose CBM predictions into interpretable binary-concept-interaction attributions and guide adaptive test-time intervention. Across $4$ datasets, we demonstrate that adaptive test-time intervention identifies key concepts that significantly improve performance for realistic human-in-the-loop settings that allow for limited concept interventions.
Problem

Research questions and friction points this paper is trying to address.

Improve interpretability of concept bottleneck models (CBM)
Maintain prediction performance while enhancing interpretability
Enable adaptive test-time intervention using interpretable binary concepts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Fast Interpretable Greedy Sum-Trees (FIGS)
Implements Binary Distillation (BD) for interpretability
Enables adaptive test-time intervention via binary concepts
🔎 Similar Papers
No similar papers found.
M
Matthew Shen
Department of Statistics, Columbia University
A
Aliyah R. Hsu
Department of EECS, UC Berkeley
Abhineet Agarwal
Abhineet Agarwal
Statistics PhD, University of California, Berkeley
Large Language ModelsAI ExplainabilityCausal InferenceBandits
B
Bin Yu
Department of Statistics, EECS, Center for Computational Biology, UC Berkeley