Understanding Task Representations in Neural Networks via Bayesian Ablation

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Neural networks’ subsymbolic semantics hinder causal attribution and interpretation of their implicit task representations. To address this, we propose a probabilistic attribution framework based on Bayesian ablation—marking the first integration of Bayesian inference with information theory to quantify the causal contribution of individual neural units to task performance. Methodologically, we define and estimate three computable representation metrics: distributional dispersion, manifold complexity, and polysemy—grounded in mutual information, effective dimensionality, and probabilistic sensitivity, respectively—to systematically characterize the mapping between representation structure and task semantics. Experiments on multi-task models demonstrate that our metrics exhibit strong correlation with generalization performance, while significantly improving both representation interpretability and attribution reliability over existing approaches.

Technology Category

Application Category

📝 Abstract

Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.

Problem

Research questions and friction points this paper is trying to address.

Interpreting latent task representations in neural networks

Inferring causal contributions of representational units

Measuring model properties like distributedness and complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian ablation for causal contribution inference

Information theory tools for model analysis

Metrics for distributedness and polysemanticity

🔎 Similar Papers

No similar papers found.