Concepts'Information Bottleneck Models

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This work addresses the unreliability of concept–prediction associations in Concept Bottleneck Models (CBMs), often caused by concept leakage and accompanied by degraded accuracy. The authors propose a theoretically grounded, architecture-agnostic information bottleneck regularization method that learns minimal sufficient concept representations by minimizing the mutual information $I(X;C)$ between inputs and concepts while preserving the mutual information $I(C;Y)$ between concepts and labels—all without modifying model architecture or requiring additional supervision. By integrating a variational objective with entropy-based proxy constraints, the approach seamlessly fits into standard CBM training pipelines. Information plane analysis confirms its mechanistic efficacy. Evaluated across six CBM variants and three benchmark datasets, the method consistently improves prediction accuracy, mitigates concept leakage, and enhances the stability of concept interventions.

Technology Category

Application Category

📝 Abstract

Concept Bottleneck Models (CBMs) aim to deliver interpretable predictions by routing decisions through a human-understandable concept layer, yet they often suffer reduced accuracy and concept leakage that undermines faithfulness. We introduce an explicit Information Bottleneck regularizer on the concept layer that penalizes $I(X;C)$ while preserving task-relevant information in $I(C;Y)$, encouraging minimal-sufficient concept representations. We derive two practical variants (a variational objective and an entropy-based surrogate) and integrate them into standard CBM training without architectural changes or additional supervision. Evaluated across six CBM families and three benchmarks, the IB-regularized models consistently outperform their vanilla counterparts. Information-plane analyses further corroborate the intended behavior. These results indicate that enforcing a minimal-sufficient concept bottleneck improves both predictive performance and the reliability of concept-level interventions. The proposed regularizer offers a theoretic-grounded, architecture-agnostic path to more faithful and intervenable CBMs, resolving prior evaluation inconsistencies by aligning training protocols and demonstrating robust gains across model families and datasets.

Problem

Research questions and friction points this paper is trying to address.

Concept Bottleneck Models

concept leakage

interpretability

faithfulness

information bottleneck

Innovation

Methods, ideas, or system contributions that make the work stand out.

Information Bottleneck

Concept Bottleneck Models

Interpretability