Sparsification and Reconstruction from the Perspective of Representation Geometry

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how sparse coding organizes the representational structure of language model activation vectors and uncovers its intrinsic links to feature disentanglement and reconstruction fidelity. Method: We propose SAEMA to empirically validate representational hierarchy; formally define local and global representations; establish a causal relationship between their separability and reconstruction quality; and reinterpret sparsity principles from a geometric perspective. Technically, we integrate rank analysis of symmetric positive semi-definite matrices, modal tensor decomposition, noise-robustness evaluation, optimization-driven representation intervention, and joint modeling of sparse coding and feature merging. Contributions/Results: Empirical results demonstrate that sparse coding not only enhances feature discriminability but also introduces orthogonal redundant dimensions; crucially, representation separability—rather than sparsity alone—is the decisive factor governing reconstruction performance. These findings provide both theoretical foundations and empirical evidence for representation disentanglement and tool design in interpretable AI.

Technology Category

Application Category

📝 Abstract
Sparse Autoencoders (SAEs) have emerged as a predominant tool in mechanistic interpretability, aiming to identify interpretable monosemantic features. However, how does sparse encoding organize the representations of activation vector from language models? What is the relationship between this organizational paradigm and feature disentanglement as well as reconstruction performance? To address these questions, we propose the SAEMA, which validates the stratified structure of the representation by observing the variability of the rank of the symmetric semipositive definite (SSPD) matrix corresponding to the modal tensor unfolded along the latent tensor with the level of noise added to the residual stream. To systematically investigate how sparse encoding alters representational structures, we define local and global representations, demonstrating that they amplify inter-feature distinctions by merging similar semantic features and introducing additional dimensionality. Furthermore, we intervene the global representation from an optimization perspective, proving a significant causal relationship between their separability and the reconstruction performance. This study explains the principles of sparsity from the perspective of representational geometry and demonstrates the impact of changes in representational structure on reconstruction performance. Particularly emphasizes the necessity of understanding representations and incorporating representational constraints, providing empirical references for developing new interpretable tools and improving SAEs. The code is available at hyperlink{https://github.com/wenjie1835/SAERepGeo}{https://github.com/wenjie1835/SAERepGeo}.
Problem

Research questions and friction points this paper is trying to address.

How sparse encoding organizes language model activation representations
Relationship between sparse encoding, feature disentanglement, and reconstruction performance
Impact of representational structure changes on reconstruction performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

SAEMA analyzes SSPD matrix rank variability
Defines local and global representation structures
Intervenes global representation for separability optimization
🔎 Similar Papers
No similar papers found.
W
Wenjie Sun
Shenzhen Institute of Advanced Technology, CAS
Bingzhe Wu
Bingzhe Wu
PKU Math-PKU CS-Tencent AI Lab-Shenzhen University
Trustworthy AI
Zhile Yang
Zhile Yang
Shenzhen Institute of Advanced Technology, CAS
C
Chengke Wu
Shenzhen Institute of Advanced Technology, CAS