Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing concept-based CLIP interpretability methods lack statistical rigor, hindering validation of concept authenticity and fair cross-method comparison. Method: We propose the first statistically grounded hypothesis testing framework for post-hoc concept decomposition, quantifying rotation-sensitive semantic structures in the embedding space. Crucially, we formulate concept discovery as a testable statistical hypothesis, ensuring identified concepts correspond to robust, reproducible semantic patterns—not optimization-induced artifacts. Contribution/Results: Our framework provides theoretical guarantees on concept validity and enables principled evaluation. On spurious correlation benchmarks, removing background concepts improves worst-group accuracy by 22.6%; reconstruction error is significantly lower than state-of-the-art baselines. The method achieves both high predictive accuracy and strong, verifiable interpretability—bridging the gap between statistical reliability and semantic meaningfulness in vision-language models.

Technology Category

Application Category

📝 Abstract

Concept-based approaches, which aim to identify human-understandable concepts within a model's internal representations, are a promising method for interpreting embeddings from deep neural network models, such as CLIP. While these approaches help explain model behavior, current methods lack statistical rigor, making it challenging to validate identified concepts and compare different techniques. To address this challenge, we introduce a hypothesis testing framework that quantifies rotation-sensitive structures within the CLIP embedding space. Once such structures are identified, we propose a post-hoc concept decomposition method. Unlike existing approaches, it offers theoretical guarantees that discovered concepts represent robust, reproducible patterns (rather than method-specific artifacts) and outperforms other techniques in terms of reconstruction error. Empirically, we demonstrate that our concept-based decomposition algorithm effectively balances reconstruction accuracy with concept interpretability and helps mitigate spurious cues in data. Applied to a popular spurious correlation dataset, our method yields a 22.6% increase in worst-group accuracy after removing spurious background concepts.

Problem

Research questions and friction points this paper is trying to address.

Lack statistical rigor in validating CLIP embedding concepts

Need robust framework to quantify embedding space structures

Improve interpretability and reduce spurious cues in data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypothesis testing framework for CLIP embeddings

Post-hoc concept decomposition with guarantees

Improves accuracy by removing spurious concepts

🔎 Similar Papers

No similar papers found.