🤖 AI Summary
Existing VAE latent spaces lack equivariance to semantic-preserving geometric transformations—such as scaling and rotation—leading to complex latent distributions and suboptimal generative performance. To address this, we propose EQ-VAE: the first explicit equivariance regularization method tailored for VAE latent spaces, requiring no architectural modifications and supporting both continuous and discrete encoders. EQ-VAE enforces geometric equivariance directly within the variational objective, yielding structurally simplified latent distributions without compromising reconstruction fidelity. Empirically, it achieves a 7× training speedup on DiT-XL/2 and, with only five fine-tuning steps, consistently improves sampling efficiency and generation quality across state-of-the-art models—including DiT, SiT, REPA, and MaskGIT—while preserving high-fidelity reconstruction and enabling efficient inference. Our core contribution lies in formulating an equivariant variational bound that induces semantically consistent, geometrically structured latent representations, advancing the design of principled generative models.
📝 Abstract
Latent generative models have emerged as a leading approach for high-quality image synthesis. These models rely on an autoencoder to compress images into a latent space, followed by a generative model to learn the latent distribution. We identify that existing autoencoders lack equivariance to semantic-preserving transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. To address this, we propose EQ-VAE, a simple regularization approach that enforces equivariance in the latent space, reducing its complexity without degrading reconstruction quality. By finetuning pre-trained autoencoders with EQ-VAE, we enhance the performance of several state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT, achieving a 7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning. EQ-VAE is compatible with both continuous and discrete autoencoders, thus offering a versatile enhancement for a wide range of latent generative models. Project page and code: https://eq-vae.github.io/.