Finding Culture-Sensitive Neurons in Vision-Language Models

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This study investigates whether culture-sensitive neurons exist in vision-language models (VLMs) and how they function in cross-cultural visual question answering (CVQA). Method: We propose Contrastive Activation Selection (CAS), a causal intervention-based method that identifies culture-sensitive neurons by analyzing neuron activation patterns under culturally contrastive inputs. Contribution/Results: We find that such neurons are significantly enriched in specific decoder layers—outperforming conventional probability- or entropy-based selection methods. Validated across 25 cultural groups, ablation of these neurons degrades performance exclusively on culturally aligned questions (average drop: 12.7%), with negligible cross-cultural interference (<0.8%), confirming their high functional specificity. This work provides the first systematic neuroscientific evidence of cultural representation in VLMs, establishing a new paradigm for interpretability and culturally fair AI research.

Technology Category

Application Category

📝 Abstract

Despite their impressive performance, vision-language models (VLMs) still struggle on culturally situated inputs. To understand how VLMs process culturally grounded information, we study the presence of culture-sensitive neurons, i.e. neurons whose activations show preferential sensitivity to inputs associated with particular cultural contexts. We examine whether such neurons are important for culturally diverse visual question answering and where they are located. Using the CVQA benchmark, we identify neurons of culture selectivity and perform causal tests by deactivating the neurons flagged by different identification methods. Experiments on three VLMs across 25 cultural groups demonstrate the existence of neurons whose ablation disproportionately harms performance on questions about the corresponding cultures, while having minimal effects on others. Moreover, we propose a new margin-based selector - Contrastive Activation Selection (CAS), and show that it outperforms existing probability- and entropy-based methods in identifying culture-sensitive neurons. Finally, our layer-wise analyses reveals that such neurons tend to cluster in certain decoder layers. Overall, our findings shed new light on the internal organization of multimodal representations.

Problem

Research questions and friction points this paper is trying to address.

Identifying culture-sensitive neurons in vision-language models

Evaluating neuron importance for culturally diverse visual question answering

Developing methods to locate culture-sensitive neurons in model layers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identified culture-sensitive neurons in vision-language models

Proposed Contrastive Activation Selection for neuron identification

Discovered culture neurons cluster in specific decoder layers

🔎 Similar Papers

See It from My Perspective: How Language Affects Cultural Bias in Image Understanding