VIVAT: Virtuous Improving VAE Training through Artifact Mitigation

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
VAEs commonly suffer from color shifts, grid artifacts, blurriness, and corner/droplet distortions during training, severely degrading reconstruction and generative fidelity. This paper introduces the first fine-grained taxonomy of VAE artifacts, systematically analyzing their root causes. We propose a lightweight, architecture-agnostic co-optimization framework—requiring no network modifications—that integrates dynamic loss weighting, adaptive padding, and Spatially Conditional Normalization. The method is plug-and-play, preserves the simplicity of KL-regularized VAEs, and enables end-to-end artifact mitigation. Extensive experiments on multiple benchmarks demonstrate significant improvements in PSNR (+1.2–2.8 dB) and SSIM (+0.02–0.05). Notably, CLIP Score rises consistently in text-to-image generation, confirming that artifact suppression enhances downstream task performance through positive transfer.

Technology Category

Application Category

📝 Abstract
Variational Autoencoders (VAEs) remain a cornerstone of generative computer vision, yet their training is often plagued by artifacts that degrade reconstruction and generation quality. This paper introduces VIVAT, a systematic approach to mitigating common artifacts in KL-VAE training without requiring radical architectural changes. We present a detailed taxonomy of five prevalent artifacts - color shift, grid patterns, blur, corner and droplet artifacts - and analyze their root causes. Through straightforward modifications, including adjustments to loss weights, padding strategies, and the integration of Spatially Conditional Normalization, we demonstrate significant improvements in VAE performance. Our method achieves state-of-the-art results in image reconstruction metrics (PSNR and SSIM) across multiple benchmarks and enhances text-to-image generation quality, as evidenced by superior CLIP scores. By preserving the simplicity of the KL-VAE framework while addressing its practical challenges, VIVAT offers actionable insights for researchers and practitioners aiming to optimize VAE training.
Problem

Research questions and friction points this paper is trying to address.

Mitigates artifacts in KL-VAE training without major architectural changes
Addresses five common VAE artifacts: color shift, grid patterns, blur, corner, droplet
Improves VAE performance in reconstruction and text-to-image generation metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adjusts loss weights and padding strategies
Integrates Spatially Conditional Normalization
Improves VAE training without architectural changes
🔎 Similar Papers
No similar papers found.
L
Lev Novitskiy
Sber AI, Moscow, Russia; National University of Science and Technology “MISIS”, Moscow, Russia
Viacheslav Vasilev
Viacheslav Vasilev
MIPT
Computer VisionGenerative AIDiffusion models
Maria Kovaleva
Maria Kovaleva
Sber AI
gen aidiffusiondeep learning
V
V.Ya. Arkhipkin
Sber AI, Moscow, Russia
Denis Dimitrov
Denis Dimitrov
Head of Kandinsky Lab
probability theorymathematical statisticsCVNLPGenerative AI