$\alpha$-TCAV: A Unified Framework for Testing with Concept Activation Vectors

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the high variance and statistical instability of Concept Activation Vectors (CAVs) in TCAV scores, which stem from the use of discontinuous indicator functions and a lack of theoretical grounding. To resolve these issues, we propose α-TCAV, a unified framework that replaces the indicator function with a parameterized smooth approximation, yielding a probabilistic formulation that subsumes both TCAV and Multi-TCAV. Leveraging probabilistic modeling, smooth approximations, and Bayesian decision theory, we derive the first distributional characterizations for PatternCAV, FastCAV, and ridge-regression-based CAVs. We further prove that concentrating the sampling budget on a single CAV significantly enhances stability. The resulting method is computationally efficient, Bayes-optimal, and—through tuning of the α parameter—recovers Multi-TCAV performance at substantially reduced computational cost.

📝 Abstract

Concept Activation Vectors (CAVs) are a fundamental tool for concept-based explainability in deep learning, yet their practical utility is limited by statistical instability. We analyze the stochastic nature of CAVs and the Testing with CAVs (TCAV) method, deriving the distributions of major CAV classes including PatternCAV, FastCAV, and ridge regression-based CAVs. We then identify a fundamental flaw in the standard TCAV score: its reliance on a discontinuous indicator function induces non-decaying variance in critical regimes. To address this, we introduce $\alpha$-TCAV, a generalized framework that replaces the indicator with a parameterized smooth function, yielding a unified probabilistic formulation that subsumes both TCAV and Multi-TCAV. We characterize the induced distributions of sensitivity scores and different TCAV variants, showing that established state-of-the-art choices lack theoretical justification. We provide principled guidance on tuning the parameter in $\alpha$-TCAV -- either to imitate Multi-TCAV at substantially lower computational cost, or to obtain a calibrated Bayes-optimal probabilistic measure of a concept's influence. Finally, our analysis yields practical recommendations that challenge established routines: most notably, allocating the full sampling budget to a single CAV rather than splitting it across several.

Problem

Research questions and friction points this paper is trying to address.

Concept Activation Vectors

TCAV

statistical instability

explainability

deep learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

α-TCAV

Concept Activation Vectors

TCAV