Who Does What in Deep Learning? Multidimensional Game-Theoretic Attribution of Function of Neural Units

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing interpretability methods (e.g., SHAP) struggle to quantify fine-grained, per-dimension contributions of neural units to high-dimensional outputs (e.g., text, images, speech). To address this, we propose Multiperturbation Shapley-value Analysis (MSA), a model-agnostic game-theoretic attribution framework that systematically ablates combinations of neural units to compute Shapley values for each unit across every output dimension—yielding “Shapley Modes”: output-aligned, high-resolution contribution maps. MSA is the first method enabling scalable, cross-scale, multi-dimensional attribution for outputs with thousands of dimensions. Applied to MLPs, GANs, and the 56B-parameter Mixtral-8x7B MoE model, it reveals novel phenomena: regularization concentrates on hub units; large language models host linguistically specialized experts; and GANs exhibit an inverted pixel-generation hierarchy. MSA provides a scalable, high-fidelity functional analysis tool for model interpretation, editing, and compression.

Technology Category

Application Category

📝 Abstract

Neural networks now generate text, images, and speech with billions of parameters, producing a need to know how each neural unit contributes to these high-dimensional outputs. Existing explainable-AI methods, such as SHAP, attribute importance to inputs, but cannot quantify the contributions of neural units across thousands of output pixels, tokens, or logits. Here we close that gap with Multiperturbation Shapley-value Analysis (MSA), a model-agnostic game-theoretic framework. By systematically lesioning combinations of units, MSA yields Shapley Modes, unit-wise contribution maps that share the exact dimensionality of the model's output. We apply MSA across scales, from multi-layer perceptrons to the 56-billion-parameter Mixtral-8x7B and Generative Adversarial Networks (GAN). The approach demonstrates how regularisation concentrates computation in a few hubs, exposes language-specific experts inside the LLM, and reveals an inverted pixel-generation hierarchy in GANs. Together, these results showcase MSA as a powerful approach for interpreting, editing, and compressing deep neural networks.

Problem

Research questions and friction points this paper is trying to address.

Quantify neural units' contributions to high-dimensional outputs

Attribute function of neural units in deep learning models

Interpret and compress deep neural networks effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Game-theoretic Multiperturbation Shapley-value Analysis (MSA)

Model-agnostic unit-wise contribution mapping

Systematic lesioning for neural unit attribution

🔎 Similar Papers

No similar papers found.