Enhancing Variational Autoencoders with Smooth Robust Latent Encoding

📅 2025-04-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in diffusion-based variational autoencoders (VAEs) of simultaneously achieving high generation fidelity and robustness. We propose Smooth Robust Latent VAE (SRL-VAE), which introduces smoothed adversarial perturbations in the latent space alongside a latent-space smoothness regularization—demonstrating for the first time that adversarial training can concurrently enhance both reconstruction fidelity and robustness. To mitigate the fidelity degradation typically incurred by standard adversarial training, SRL-VAE incorporates originality-preserving representation constraints and lightweight post-training fine-tuning. Experiments show significant improvements in PSNR and SSIM on image reconstruction and text-guided editing tasks. Moreover, SRL-VAE exhibits strong robustness against Nightshade attacks and diverse image-editing adversarial perturbations, with negligible computational overhead. The core innovation lies in a novel latent-space smooth robust modeling paradigm that jointly optimizes generation quality, semantic consistency, and adversarial resilience.

Technology Category

Application Category

📝 Abstract
Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models, as in Stable Diffusion, yet questions regarding their robustness remain largely underexplored. Although adversarial training has been an established technique for enhancing robustness in predictive models, it has been overlooked for generative models due to concerns about potential fidelity degradation by the nature of trade-offs between performance and robustness. In this work, we challenge this presumption, introducing Smooth Robust Latent VAE (SRL-VAE), a novel adversarial training framework that boosts both generation quality and robustness. In contrast to conventional adversarial training, which focuses on robustness only, our approach smooths the latent space via adversarial perturbations, promoting more generalizable representations while regularizing with originality representation to sustain original fidelity. Applied as a post-training step on pre-trained VAEs, SRL-VAE improves image robustness and fidelity with minimal computational overhead. Experiments show that SRL-VAE improves both generation quality, in image reconstruction and text-guided image editing, and robustness, against Nightshade attacks and image editing attacks. These results establish a new paradigm, showing that adversarial training, once thought to be detrimental to generative models, can instead enhance both fidelity and robustness.
Problem

Research questions and friction points this paper is trying to address.

Enhancing VAE robustness without fidelity degradation
Improving generative quality and adversarial robustness simultaneously
Smoothing latent space for better generalization and original fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

SRL-VAE enhances VAEs with adversarial training
Smooths latent space via adversarial perturbations
Improves image robustness and fidelity jointly
🔎 Similar Papers