From Data Statistics to Feature Geometry: How Correlations Shape Superposition

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work challenges the classical assumption in overcomplete representation theory that features are sparse and uncorrelated, which fails to account for the complex geometric structures observed in real language models due to feature correlations. To systematically investigate how such correlations shape representational geometry, the authors construct a controlled Bag-of-Words Superposition (BOWS) setting, integrating geometric analysis, weight decay regularization, and ReLU nonlinearities. Their findings demonstrate that feature correlations do not merely introduce noise; rather, they actively drive the emergence of key phenomena seen in real models—such as semantic clustering and cyclic structures—through constructive interference in superposition. These results substantiate that correlation-driven superposition representations offer a more accurate and expressive framework than traditional sparse, uncorrelated assumptions.

Technology Category

Application Category

📝 Abstract

A central idea in mechanistic interpretability is that neural networks represent more features than they have dimensions, arranging them in superposition to form an over-complete basis. This framing has been influential, motivating dictionary learning approaches such as sparse autoencoders. However, superposition has mostly been studied in idealized settings where features are sparse and uncorrelated. In these settings, superposition is typically understood as introducing interference that must be minimized geometrically and filtered out by non-linearities such as ReLUs, yielding local structures like regular polytopes. We show that this account is incomplete for realistic data by introducing Bag-of-Words Superposition (BOWS), a controlled setting to encode binary bag-of-words representations of internet text in superposition. Using BOWS, we find that when features are correlated, interference can be constructive rather than just noise to be filtered out. This is achieved by arranging features according to their co-activation patterns, making interference between active features constructive, while still using ReLUs to avoid false positives. We show that this kind of arrangement is more prevalent in models trained with weight decay and naturally gives rise to semantic clusters and cyclical structures which have been observed in real language models yet were not explained by the standard picture of superposition. Code for this paper can be found at https://github.com/LucasPrietoAl/correlations-feature-geometry.

Problem

Research questions and friction points this paper is trying to address.

superposition

feature correlation

neural network interpretability

semantic structure

over-complete representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

superposition

feature correlation

constructive interference