🤖 AI Summary
Existing interpretability tools struggle to analyze how the sign combinations of gating and input signals in gated activation functions—such as SwiGLU—affect neuron behavior. To address this gap, this work proposes GLUScope, the first open-source tool enabling systematic analysis of the four sign-based configurations (positive/negative gate × positive/negative input) in GLU neurons within Transformer models. By integrating activation sign decomposition, text example extraction, and interactive visualization, GLUScope reveals distinct semantic roles associated with each configuration. Empirical results demonstrate that these sign combinations correspond to markedly different linguistic functions, offering novel insights into the inner workings of gating mechanisms. The tool and its accompanying demo platform have been publicly released to foster further community research.
📝 Abstract
We present GLUScope, an open-source tool for analyzing neurons in Transformer-based language models, intended for interpretability researchers. We focus on more recent models than previous tools do; specifically we consider gated activation functions such as SwiGLU. This introduces a new challenge: understanding positive activations is not enough. Instead, both the gate and the in activation of a neuron can be positive or negative, leading to four different possible sign combinations that in some cases have quite different functionalities. Accordingly, for any neuron, our tool shows text examples for each of the four sign combinations, and indicates how often each combination occurs. We describe examples of how our tool can lead to novel insights. A demo is available at https: //sjgerstner.github.io/gluscope.