EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

📅 2025-02-13

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing VAE latent spaces lack equivariance to semantic-preserving geometric transformations—such as scaling and rotation—leading to complex latent distributions and suboptimal generative performance. To address this, we propose EQ-VAE: the first explicit equivariance regularization method tailored for VAE latent spaces, requiring no architectural modifications and supporting both continuous and discrete encoders. EQ-VAE enforces geometric equivariance directly within the variational objective, yielding structurally simplified latent distributions without compromising reconstruction fidelity. Empirically, it achieves a 7× training speedup on DiT-XL/2 and, with only five fine-tuning steps, consistently improves sampling efficiency and generation quality across state-of-the-art models—including DiT, SiT, REPA, and MaskGIT—while preserving high-fidelity reconstruction and enabling efficient inference. Our core contribution lies in formulating an equivariant variational bound that induces semantically consistent, geometrically structured latent representations, advancing the design of principled generative models.

Technology Category

Application Category

📝 Abstract

Latent generative models have emerged as a leading approach for high-quality image synthesis. These models rely on an autoencoder to compress images into a latent space, followed by a generative model to learn the latent distribution. We identify that existing autoencoders lack equivariance to semantic-preserving transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. To address this, we propose EQ-VAE, a simple regularization approach that enforces equivariance in the latent space, reducing its complexity without degrading reconstruction quality. By finetuning pre-trained autoencoders with EQ-VAE, we enhance the performance of several state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT, achieving a 7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning. EQ-VAE is compatible with both continuous and discrete autoencoders, thus offering a versatile enhancement for a wide range of latent generative models. Project page and code: https://eq-vae.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Improves latent space equivariance for image synthesis

Reduces latent space complexity in autoencoders

Enhances generative model performance with EQ-VAE

Innovation

Methods, ideas, or system contributions that make the work stand out.

Equivariance regularization in latent space

Fine-tuning pre-trained autoencoders

Compatibility with various autoencoders

🔎 Similar Papers

No similar papers found.