Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing sparse autoencoders (SAEs) predominantly employ shallow architectures and implicitly rely on the quasi-orthogonality assumption, limiting their ability to extract strongly correlated features and thereby hindering the interpretability of neural representations. This work identifies and analyzes this limitation using MNIST as a benchmark. To address it, we propose the Matching Pursuit-based Multi-Iterative SAE (MP-SAE), the first SAE framework to explicitly integrate matching pursuit. MP-SAE performs residual-guided, hierarchical iterative optimization, eliminating dependence on quasi-orthogonality. Crucially, each atom selection step monotonically reduces the reconstruction error, enabling interpretable and provably convergent reconstruction of correlated features. Experiments demonstrate that MP-SAE significantly outperforms conventional SAEs in both reconstruction fidelity and feature disentanglement. By decoupling sparsity from orthogonality constraints, MP-SAE establishes a novel paradigm for robust, sparse modeling of neural representations.

Technology Category

Application Category

📝 Abstract

Sparse autoencoders (SAEs) have recently become central tools for interpretability, leveraging dictionary learning principles to extract sparse, interpretable features from neural representations whose underlying structure is typically unknown. This paper evaluates SAEs in a controlled setting using MNIST, which reveals that current shallow architectures implicitly rely on a quasi-orthogonality assumption that limits the ability to extract correlated features. To move beyond this, we introduce a multi-iteration SAE by unrolling Matching Pursuit (MP-SAE), enabling the residual-guided extraction of correlated features that arise in hierarchical settings such as handwritten digit generation while guaranteeing monotonic improvement of the reconstruction as more atoms are selected.

Problem

Research questions and friction points this paper is trying to address.

Evaluating limitations of shallow sparse autoencoders in feature extraction

Addressing quasi-orthogonality assumption hindering correlated feature discovery

Proposing residual-guided MP-SAE for hierarchical feature reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-iteration SAE unrolling Matching Pursuit

Residual-guided extraction of correlated features

Guarantees monotonic reconstruction improvement

🔎 Similar Papers

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models