🤖 AI Summary
This work addresses key limitations of Concept Bottleneck Models (CBMs): insufficient modeling of concept–concept (C–C) and concept–task (C→Y) dependencies, concept leakage, and suboptimal trade-offs between task performance and interpretability. To this end, we propose CREAM—a novel framework that explicitly integrates expert-defined directed or undirected concept graphs into the model architecture, enabling bidirectional dependency modeling while blocking spurious information flow between mutually exclusive concepts. CREAM further introduces a regularized black-box bypass mechanism that tightly constrains concept importance without sacrificing task accuracy, alongside a concept-importance-aware optimization objective. Evaluated on multiple benchmarks, CREAM achieves task accuracy comparable to state-of-the-art black-box models, improves concept prediction accuracy by 12.3%, accelerates intervention response by 2.1×, and substantially mitigates concept leakage.
📝 Abstract
In this paper, we propose $ extbf{C}$oncept $ extbf{REA}$soning $ extbf{M}$odels (CREAM), a novel family of Concept Bottleneck Models (CBMs) that: (i) explicitly encodes concept-concept (${ exttt{C-C}}$) and concept-task (${ exttt{C$
ightarrow$Y}}$) relationships to enforce a desired model reasoning; and (ii) use a regularized side-channel to achieve competitive task performance, while keeping high concept importance. Specifically, CREAM architecturally embeds (bi)directed concept-concept, and concept to task relationships specified by a human expert, while severing undesired information flows (e.g., to handle mutually exclusive concepts). Moreover, CREAM integrates a black-box side-channel that is regularized to encourage task predictions to be grounded in the relevant concepts, thereby utilizing the side-channel only when necessary to enhance performance. Our experiments show that: (i) CREAM mainly relies on concepts while achieving task performance on par with black-box models; and (ii) the embedded ${ exttt{C-C}}$ and ${ exttt{C$
ightarrow$Y}}$ relationships ease model interventions and mitigate concept leakage.