🤖 AI Summary
Prior work assumes continuous latent spaces in 3D generative models, yet interpretability and theoretical foundations for 3D representation learning remain underexplored.
Method: We pioneer the application of sparse autoencoders (SAEs) to analyze the latent space of a state-of-the-art 3D reconstruction VAE trained on 53K Objaverse models. Integrating SAEs, feature ablation, and phase-transition point statistics, we empirically characterize latent feature distributions.
Contributions/Results: (1) We propose a novel paradigm—“discrete state space + phase-transition-driven activation”—explaining counterintuitive phenomena including positional encoding bias, sigmoidal reconstruction loss decay, and bimodal phase-transition point distributions. (2) We identify and model a superpositional interference reallocation mechanism that enhances feature saliency. (3) We establish the first interpretable dynamical framework for 3D feature learning. Our findings lay new theoretical and practical foundations for interpretable AI and dictionary learning in 3D representation.
📝 Abstract
Sparse Autoencoders (SAEs) are a powerful dictionary learning technique for decomposing neural network activations, translating the hidden state into human ideas with high semantic value despite no external intervention or guidance. However, this technique has rarely been applied outside of the textual domain, limiting theoretical explorations of feature decomposition. We present the extbf{first application of SAEs to the 3D domain}, analyzing the features used by a state-of-the-art 3D reconstruction VAE applied to 53k 3D models from the Objaverse dataset. We observe that the network encodes discrete rather than continuous features, leading to our key finding: extbf{such models approximate a discrete state space, driven by phase-like transitions from feature activations}. Through this state transition framework, we address three otherwise unintuitive behaviors -- the inclination of the reconstruction model towards positional encoding representations, the sigmoidal behavior of reconstruction loss from feature ablation, and the bimodality in the distribution of phase transition points. This final observation suggests the model extbf{redistributes the interference caused by superposition to prioritize the saliency of different features}. Our work not only compiles and explains unexpected phenomena regarding feature decomposition, but also provides a framework to explain the model's feature learning dynamics. The code and dataset of encoded 3D objects will be available on release.