Deferring Concept Bottleneck Models: Learning to Defer Interventions to Inaccurate Experts

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing concept bottleneck models (CBMs) assume humans can always accurately identify and perform necessary interventions, overlooking the cost and error risk of human involvement—limiting practical deployment. This work proposes the Deferred Concept Bottleneck Model (DCBM), the first CBM framework to integrate learning-to-defer (L2D) end-to-end into its architecture, enabling the model to autonomously determine when human intervention is required. DCBM jointly optimizes a differentiable defer-decision module and a concept-consistency loss, preserving predictive accuracy while supporting concept-level, interpretable deferral attribution. Evaluated on multiple benchmark datasets, DCBM achieves significant improvements in both prediction accuracy and interpretability, while substantially reducing human intervention frequency. By unifying conceptual reasoning with adaptive deferral, DCBM advances CBMs toward real-world applicability and robustness.

Technology Category

Application Category

📝 Abstract
Concept Bottleneck Models (CBMs) are machine learning models that improve interpretability by grounding their predictions on human-understandable concepts, allowing for targeted interventions in their decision-making process. However, when intervened on, CBMs assume the availability of humans that can identify the need to intervene and always provide correct interventions. Both assumptions are unrealistic and impractical, considering labor costs and human error-proneness. In contrast, Learning to Defer (L2D) extends supervised learning by allowing machine learning models to identify cases where a human is more likely to be correct than the model, thus leading to deferring systems with improved performance. In this work, we gain inspiration from L2D and propose Deferring CBMs (DCBMs), a novel framework that allows CBMs to learn when an intervention is needed. To this end, we model DCBMs as a composition of deferring systems and derive a consistent L2D loss to train them. Moreover, by relying on a CBM architecture, DCBMs can explain why defer occurs on the final task. Our results show that DCBMs achieve high predictive performance and interpretability at the cost of deferring more to humans.
Problem

Research questions and friction points this paper is trying to address.

Improving interpretability of machine learning models using human-understandable concepts.
Addressing impractical assumptions of human intervention availability and accuracy in CBMs.
Proposing a framework for CBMs to learn when to defer to human experts.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deferring CBMs learn intervention timing.
Combines deferring systems with CBMs.
Uses L2D loss for training DCBMs.
🔎 Similar Papers
No similar papers found.