Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts

📅 2025-04-24

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This paper identifies a “leakage poisoning” flaw in existing concept models (CMs) under out-of-distribution (OOD) conditions: CMs erroneously exploit spurious feature leakage, rendering human-defined concept interventions ineffective. To address this, we propose MixCEM—a novel architecture integrating concept embeddings, a gated mixture mechanism, and a distribution-aware leakage suppression module. MixCEM adaptively leverages features across ID/OOD regimes: it safely incorporates auxiliary information only under in-distribution conditions while dynamically blocking leakage under OOD conditions. The model supports end-to-end differentiable intervention training. Extensive multi-task experiments demonstrate that concept interventions yield an average accuracy gain of 12.3% on both ID and OOD samples—substantially outperforming strong baselines. Crucially, performance gains remain robust even with incomplete concept annotations. To our knowledge, this is the first work achieving distributionally consistent and effective concept intervention.

Technology Category

Application Category

📝 Abstract

In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e.g., stripes, black) and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i.e., operations where a human expert corrects a CM's mispredicted concepts at test time) on CMs' task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.

Problem

Research questions and friction points this paper is trying to address.

Studying concept interventions' impact on OOD inputs

Addressing leakage poisoning in concept-based models

Improving accuracy for both ID and OOD samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept-based models with dynamic leakage exploitation

MixCEM improves accuracy for OOD samples

Human expert interventions correct mispredicted concepts

🔎 Similar Papers

No similar papers found.