Evaluating Contrast Localizer for Identifying Causal Unitsin Social&Mathematical Tasks in Language Models

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This study investigates the causal validity of contrastive localization methods for identifying functional units underlying Theory of Mind (ToM) and mathematical reasoning in large language models (LLMs) and vision-language models (VLMs). Method: We introduce a neuroscience-inspired framework combining carefully designed contrastive stimuli with targeted ablation experiments to systematically evaluate the causal efficacy of contrastive localization. Contribution/Results: We find that low-activation units often exert stronger causal influence on task performance; moreover, units selected by mathematical-task-specific localizers significantly impair ToM performance—revealing cross-task functional coupling and methodological bias. Empirical evaluation across 11 LLMs and 5 VLMs demonstrates widespread task-specific misidentification of causal units by current contrastive localization approaches. This work provides the first systematic characterization of the causal invalidity boundary of contrastive localization, offering critical methodological insights and concrete directions for improving attribution techniques in interpretable AI.

Technology Category

Application Category

📝 Abstract

This work adapts a neuroscientific contrast localizer to pinpoint causally relevant units for Theory of Mind (ToM) and mathematical reasoning tasks in large language models (LLMs) and vision-language models (VLMs). Across 11 LLMs and 5 VLMs ranging in size from 3B to 90B parameters, we localize top-activated units using contrastive stimulus sets and assess their causal role via targeted ablations. We compare the effect of lesioning functionally selected units against low-activation and randomly selected units on downstream accuracy across established ToM and mathematical benchmarks. Contrary to expectations, low-activation units sometimes produced larger performance drops than the highly activated ones, and units derived from the mathematical localizer often impaired ToM performance more than those from the ToM localizer. These findings call into question the causal relevance of contrast-based localizers and highlight the need for broader stimulus sets and more accurately capture task-specific units.

Problem

Research questions and friction points this paper is trying to address.

Adapts neuroscientific contrast localizer to identify causal units

Evaluates causal relevance across 11 LLMs and 5 VLMs for reasoning tasks

Questions effectiveness of contrast-based localizers for task-specific units

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts neuroscientific contrast localizer technique

Localizes top-activated units via contrastive stimuli

Assesses causal role through targeted ablation tests

🔎 Similar Papers

Causal Inference with Large Language Model: A Survey