Quantifying Interpretability in CLIP Models with Concept Consistency

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the lack of quantitative and interpretable analysis regarding textual concept consistency across attention heads in CLIP-style models. We formally define and introduce the Concept Consistency Score (CCS), an interpretable metric that automatically evaluates how faithfully each attention head focuses on textual concepts, leveraging large language models (LLMs) via in-context learning (ChatGPT) and LLM-as-a-judge validation. Through soft pruning and cross-model evaluation (OpenAI CLIP and OpenCLIP), we demonstrate that high-CCS heads play a structurally critical role in model robustness and generalization: their removal significantly degrades performance, and CCS strongly correlates with out-of-distribution detection, concept reasoning, and video-language understanding. Crucially, CCS is validated as a generalizable and transferable interpretability measure across architectures and tasks, establishing a novel paradigm for mechanistic analysis of vision-language models.

Technology Category

Application Category

📝 Abstract

CLIP is one of the most popular foundational models and is heavily used for many vision-language tasks. However, little is known about the inner workings of CLIP. While recent work has proposed decomposition-based interpretability methods for identifying textual descriptions of attention heads in CLIP, the implications of conceptual consistency in these text labels on interpretability and model performance has not been explored. To bridge this gap, we study the conceptual consistency of text descriptions for attention heads in CLIP-like models. We conduct extensive experiments on six different models from OpenAI and OpenCLIP which vary by size, type of pre-training data and patch size. We propose Concept Consistency Score (CCS), a novel interpretability metric that measures how consistently individual attention heads in CLIP models align with specific concepts. To assign concept labels to heads, we use in-context learning with ChatGPT, guided by a few manually-curated examples, and validate these labels using an LLM-as-a-judge approach. Our soft-pruning experiments reveal that high CCS heads are critical for preserving model performance, as pruning them leads to a significantly larger performance drop than pruning random or low CCS heads. Notably, we find that high CCS heads capture essential concepts and play a key role in out-of-domain detection, concept-specific reasoning, and video-language understanding. These results position CCS as a powerful interpretability metric for analyzing CLIP-like models.

Problem

Research questions and friction points this paper is trying to address.

Explores conceptual consistency in CLIP model attention heads.

Proposes Concept Consistency Score (CCS) for interpretability analysis.

Identifies critical role of high CCS heads in model performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Concept Consistency Score (CCS)

Uses ChatGPT for concept label assignment

Validates labels with LLM-as-a-judge approach

🔎 Similar Papers

No similar papers found.