CUBIC: Concept Embeddings for Unsupervised Bias Identification using VLMs

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Deep vision models are vulnerable to latent biases induced by spurious correlations in data. Existing concept-based approaches rely on costly human-annotated bias-relevant concepts, limiting scalability and generalizability. This paper proposes the first unsupervised concept-based bias detection framework—requiring no predefined bias candidates, manual concept annotations, or failure-case samples. Leveraging the joint image-text embedding space of vision-language models (e.g., CLIP), our method quantifies the influence of latent semantic concepts (e.g., background, texture, style) on classification decisions by measuring alignment between superclass-label-induced representation shifts and linear probe decision-boundary normal vectors. Evaluated across multiple benchmarks, our approach significantly outperforms supervised and weakly supervised baselines, accurately uncovering previously unknown bias-inducing concepts. It enables interpretable, generalizable, and fully automated identification of latent biases without human supervision.

Technology Category

Application Category

📝 Abstract

Deep vision models often rely on biases learned from spurious correlations in datasets. To identify these biases, methods that interpret high-level, human-understandable concepts are more effective than those relying primarily on low-level features like heatmaps. A major challenge for these concept-based methods is the lack of image annotations indicating potentially bias-inducing concepts, since creating such annotations requires detailed labeling for each dataset and concept, which is highly labor-intensive. We present CUBIC (Concept embeddings for Unsupervised Bias IdentifiCation), a novel method that automatically discovers interpretable concepts that may bias classifier behavior. Unlike existing approaches, CUBIC does not rely on predefined bias candidates or examples of model failures tied to specific biases, as such information is not always available. Instead, it leverages image-text latent space and linear classifier probes to examine how the latent representation of a superclass label$unicode{x2014}$shared by all instances in the dataset$unicode{x2014}$is influenced by the presence of a given concept. By measuring these shifts against the normal vector to the classifier's decision boundary, CUBIC identifies concepts that significantly influence model predictions. Our experiments demonstrate that CUBIC effectively uncovers previously unknown biases using Vision-Language Models (VLMs) without requiring the samples in the dataset where the classifier underperforms or prior knowledge of potential biases.

Problem

Research questions and friction points this paper is trying to address.

Identifies biases in deep vision models without manual annotations

Discovers interpretable bias-inducing concepts using VLMs

Measures concept influence on classifier decisions unsupervisedly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses image-text latent space for bias discovery

Employs linear classifier probes to measure concept influence

Leverages VLMs without predefined bias candidates

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings