Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing concept-based CLIP interpretability methods lack statistical rigor, hindering validation of concept authenticity and fair cross-method comparison. Method: We propose the first statistically grounded hypothesis testing framework for post-hoc concept decomposition, quantifying rotation-sensitive semantic structures in the embedding space. Crucially, we formulate concept discovery as a testable statistical hypothesis, ensuring identified concepts correspond to robust, reproducible semantic patterns—not optimization-induced artifacts. Contribution/Results: Our framework provides theoretical guarantees on concept validity and enables principled evaluation. On spurious correlation benchmarks, removing background concepts improves worst-group accuracy by 22.6%; reconstruction error is significantly lower than state-of-the-art baselines. The method achieves both high predictive accuracy and strong, verifiable interpretability—bridging the gap between statistical reliability and semantic meaningfulness in vision-language models.

Technology Category

Application Category

📝 Abstract
Concept-based approaches, which aim to identify human-understandable concepts within a model's internal representations, are a promising method for interpreting embeddings from deep neural network models, such as CLIP. While these approaches help explain model behavior, current methods lack statistical rigor, making it challenging to validate identified concepts and compare different techniques. To address this challenge, we introduce a hypothesis testing framework that quantifies rotation-sensitive structures within the CLIP embedding space. Once such structures are identified, we propose a post-hoc concept decomposition method. Unlike existing approaches, it offers theoretical guarantees that discovered concepts represent robust, reproducible patterns (rather than method-specific artifacts) and outperforms other techniques in terms of reconstruction error. Empirically, we demonstrate that our concept-based decomposition algorithm effectively balances reconstruction accuracy with concept interpretability and helps mitigate spurious cues in data. Applied to a popular spurious correlation dataset, our method yields a 22.6% increase in worst-group accuracy after removing spurious background concepts.
Problem

Research questions and friction points this paper is trying to address.

Lack statistical rigor in validating CLIP embedding concepts
Need robust framework to quantify embedding space structures
Improve interpretability and reduce spurious cues in data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypothesis testing framework for CLIP embeddings
Post-hoc concept decomposition with guarantees
Improves accuracy by removing spurious concepts
🔎 Similar Papers
No similar papers found.