Evaluating Contrast Localizer for Identifying Causal Unitsin Social&Mathematical Tasks in Language Models

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the causal validity of contrastive localization methods for identifying functional units underlying Theory of Mind (ToM) and mathematical reasoning in large language models (LLMs) and vision-language models (VLMs). Method: We introduce a neuroscience-inspired framework combining carefully designed contrastive stimuli with targeted ablation experiments to systematically evaluate the causal efficacy of contrastive localization. Contribution/Results: We find that low-activation units often exert stronger causal influence on task performance; moreover, units selected by mathematical-task-specific localizers significantly impair ToM performance—revealing cross-task functional coupling and methodological bias. Empirical evaluation across 11 LLMs and 5 VLMs demonstrates widespread task-specific misidentification of causal units by current contrastive localization approaches. This work provides the first systematic characterization of the causal invalidity boundary of contrastive localization, offering critical methodological insights and concrete directions for improving attribution techniques in interpretable AI.

Technology Category

Application Category

📝 Abstract
This work adapts a neuroscientific contrast localizer to pinpoint causally relevant units for Theory of Mind (ToM) and mathematical reasoning tasks in large language models (LLMs) and vision-language models (VLMs). Across 11 LLMs and 5 VLMs ranging in size from 3B to 90B parameters, we localize top-activated units using contrastive stimulus sets and assess their causal role via targeted ablations. We compare the effect of lesioning functionally selected units against low-activation and randomly selected units on downstream accuracy across established ToM and mathematical benchmarks. Contrary to expectations, low-activation units sometimes produced larger performance drops than the highly activated ones, and units derived from the mathematical localizer often impaired ToM performance more than those from the ToM localizer. These findings call into question the causal relevance of contrast-based localizers and highlight the need for broader stimulus sets and more accurately capture task-specific units.
Problem

Research questions and friction points this paper is trying to address.

Adapts neuroscientific contrast localizer to identify causal units
Evaluates causal relevance across 11 LLMs and 5 VLMs for reasoning tasks
Questions effectiveness of contrast-based localizers for task-specific units
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts neuroscientific contrast localizer technique
Localizes top-activated units via contrastive stimuli
Assesses causal role through targeted ablation tests
🔎 Similar Papers
No similar papers found.