Understanding Task Representations in Neural Networks via Bayesian Ablation

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Neural networks’ subsymbolic semantics hinder causal attribution and interpretation of their implicit task representations. To address this, we propose a probabilistic attribution framework based on Bayesian ablation—marking the first integration of Bayesian inference with information theory to quantify the causal contribution of individual neural units to task performance. Methodologically, we define and estimate three computable representation metrics: distributional dispersion, manifold complexity, and polysemy—grounded in mutual information, effective dimensionality, and probabilistic sensitivity, respectively—to systematically characterize the mapping between representation structure and task semantics. Experiments on multi-task models demonstrate that our metrics exhibit strong correlation with generalization performance, while significantly improving both representation interpretability and attribution reliability over existing approaches.

Technology Category

Application Category

📝 Abstract
Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.
Problem

Research questions and friction points this paper is trying to address.

Interpreting latent task representations in neural networks
Inferring causal contributions of representational units
Measuring model properties like distributedness and complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian ablation for causal contribution inference
Information theory tools for model analysis
Metrics for distributedness and polysemanticity
🔎 Similar Papers
No similar papers found.
A
Andrew Nam
AI Lab, Princeton University
Declan Campbell
Declan Campbell
Graduate Student, Princeton Neuroscience Institute
T
Thomas Griffiths
Department of Psychology, Princeton University
J
Jonathan Cohen
Princeton Neuroscience Institute, Princeton University
Sarah-Jane Leslie
Sarah-Jane Leslie
Class of 1943 Professor, Philosophy & Statistics and Machine Learning, Princeton University
Cognitive science