🤖 AI Summary
Existing interpretability methods (e.g., SHAP) struggle to quantify fine-grained, per-dimension contributions of neural units to high-dimensional outputs (e.g., text, images, speech). To address this, we propose Multiperturbation Shapley-value Analysis (MSA), a model-agnostic game-theoretic attribution framework that systematically ablates combinations of neural units to compute Shapley values for each unit across every output dimension—yielding “Shapley Modes”: output-aligned, high-resolution contribution maps. MSA is the first method enabling scalable, cross-scale, multi-dimensional attribution for outputs with thousands of dimensions. Applied to MLPs, GANs, and the 56B-parameter Mixtral-8x7B MoE model, it reveals novel phenomena: regularization concentrates on hub units; large language models host linguistically specialized experts; and GANs exhibit an inverted pixel-generation hierarchy. MSA provides a scalable, high-fidelity functional analysis tool for model interpretation, editing, and compression.
📝 Abstract
Neural networks now generate text, images, and speech with billions of parameters, producing a need to know how each neural unit contributes to these high-dimensional outputs. Existing explainable-AI methods, such as SHAP, attribute importance to inputs, but cannot quantify the contributions of neural units across thousands of output pixels, tokens, or logits. Here we close that gap with Multiperturbation Shapley-value Analysis (MSA), a model-agnostic game-theoretic framework. By systematically lesioning combinations of units, MSA yields Shapley Modes, unit-wise contribution maps that share the exact dimensionality of the model's output. We apply MSA across scales, from multi-layer perceptrons to the 56-billion-parameter Mixtral-8x7B and Generative Adversarial Networks (GAN). The approach demonstrates how regularisation concentrates computation in a few hubs, exposes language-specific experts inside the LLM, and reveals an inverted pixel-generation hierarchy in GANs. Together, these results showcase MSA as a powerful approach for interpreting, editing, and compressing deep neural networks.