FACE: Faithful Automatic Concept Extraction

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing automatic concept discovery methods often decouple learned concepts from the model’s true decision-making mechanism, undermining explanatory fidelity. To address this, we propose a falsifiable concept learning framework grounded in non-negative matrix factorization (NMF). Our approach jointly incorporates KL-divergence regularization and classifier supervision to explicitly align low-level features with high-level semantic concepts, thereby ensuring conceptual consistency with model predictions. This design provides theoretical guarantees on local linear faithfulness—i.e., the extent to which concept-based approximations preserve the original model’s behavior in local neighborhoods. Extensive experiments on ImageNet, COCO, and CelebA demonstrate that our method significantly outperforms state-of-the-art approaches on two core metrics: concept fidelity (alignment with ground-truth semantics) and concept sparsity (compactness of the concept basis). By bridging the gap between representation learning and model interpretation, our framework enhances both the faithfulness and reliability of deep neural network explanations.

Technology Category

Application Category

📝 Abstract

Interpreting deep neural networks through concept-based explanations offers a bridge between low-level features and high-level human-understandable semantics. However, existing automatic concept discovery methods often fail to align these extracted concepts with the model's true decision-making process, thereby compromising explanation faithfulness. In this work, we propose FACE (Faithful Automatic Concept Extraction), a novel framework that augments Non-negative Matrix Factorization (NMF) with a Kullback-Leibler (KL) divergence regularization term to ensure alignment between the model's original and concept-based predictions. Unlike prior methods that operate solely on encoder activations, FACE incorporates classifier supervision during concept learning, enforcing predictive consistency and enabling faithful explanations. We provide theoretical guarantees showing that minimizing the KL divergence bounds the deviation in predictive distributions, thereby promoting faithful local linearity in the learned concept space. Systematic evaluations on ImageNet, COCO, and CelebA datasets demonstrate that FACE outperforms existing methods across faithfulness and sparsity metrics.

Problem

Research questions and friction points this paper is trying to address.

Ensures concept alignment with model decision-making process

Improves faithfulness of automatic concept discovery methods

Maintains predictive consistency between original and concept-based outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augments NMF with KL divergence for concept alignment

Incorporates classifier supervision during concept learning

Ensures predictive consistency via theoretical guarantees

🔎 Similar Papers

No similar papers found.