Bilinear autoencoders find interpretable manifolds

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing linear sparse autoencoders struggle to directly model the multidimensional, nonlinear manifolds of interpretable concepts in neural networks and often rely on post-hoc analysis. This work proposes a bilinear autoencoder that overcomes the limitations of linear representation by decomposing activations via low-rank quadratic forms and performing linear combinations in weight space. By introducing quadratic latent variables, the method explicitly captures high-dimensional geometric structures while preserving mathematical tractability, yielding a highly expressive yet interpretable nonlinear latent space. Experiments demonstrate that the model substantially reduces reconstruction error in language models, confirming the ubiquity of such geometric structures. Furthermore, consistent input subspaces emerge under varying geometric priors, and the discovered manifolds are visualized using an interactive tool integrated with Qwen 3.5.

📝 Abstract

Sparse autoencoders have become a standard tool for uncovering interpretable latent representations in neural networks. Yet salient concepts often span manifolds that current linear methods cannot capture without post hoc analysis. This paper uses quadratic latents to close this gap: we implement these with bilinear autoencoders, which decompose activations into low-rank quadratic forms, compose linearly in weight space, and admit input-independent geometric analysis. This qualitative difference in what concepts quadratic latents can detect challenges the standard linear representation hypothesis. Our experiments and visualisations show that multi-dimensional geometries are highly prevalent and that composite latents capture them well, systematically improving reconstruction error in language models. Furthermore, we show that autoencoders with varying geometric priors recover the same input subspace despite their dictionary entries being distinct. Practically, these models serve as an unsupervised tool for manifold discovery, which we demonstrate through an interactive online visualizer for Qwen 3.5. This is a step toward nonlinear but mathematically tractable latent representations whose composition is expressive and interpretable by design.

Problem

Research questions and friction points this paper is trying to address.

interpretable representations

manifold discovery

nonlinear latents

autoencoders

geometric structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

bilinear autoencoders

quadratic latents

interpretable manifolds