Adversarial Error Correction for Visual Autoregressive Generation

📅 2026-05-23

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the susceptibility of visual autoregressive (VAR) models to cascading errors, which often leads to distorted image generation. The authors propose AID-VAR, a framework that introduces a lightweight guidance injector and a discriminator to enable cross-scale active error correction through adversarial error diagnosis and non-invasive feature manifold rectification, without modifying the pretrained VAR backbone. Leveraging a frozen-backbone fine-tuning strategy and a novel multi-scale consistency scoring metric (ISCS), the method effectively evaluates and enhances generation quality. Experiments demonstrate that AID-VAR consistently improves performance across various VAR backbones; for instance, AID-VAR-d20 achieves a 16% FID improvement with only a 3% increase in parameters, yielding images with sharper details and more stable structures.

📝 Abstract

Visual Autoregressive (VAR) models have emerged as a powerful paradigm for image synthesis by performing hierarchical next-scale prediction. However, VAR models are inherently prone to cascading error propagation, where subtle coarse-scale mispredictions are amplified across the hierarchy, ultimately distorting the final synthesis. To mitigate this, we propose AID-VAR, a plug-and-play framework that enhances pre-trained VARs through Adversarially Injected Diagnosis. Instead of a standard passive generation, AID-VAR introduces a proactive error-correction mechanism inspired by the adversarial feedback in GANs. We deploy a discriminator to diagnose fidelity gaps at each scale transition, coupled with a lightweight guidance injector. This module operates as a non-invasive adapter that refines the feature manifold of a frozen VAR backbone, effectively steering the generation toward the distribution of real images without destabilizing the pre-trained latent space. Furthermore, to rigorously evaluate this cross-scale progression, we introduce the Inter-Scale Consistency Score (ISCS), a novel metric that quantifies the fidelity and structural alignment between consecutive resolution scales. Experimental results across various backbones demonstrate that AID-VAR delivers sharper textural details and fewer structural distortions with negligible overhead. For instance, AID-VAR-d20 achieves a 16% improvement in FID with only a 3% increase in parameters. These results establish AID-VAR as a highly efficient and scalable pathway for upgrading large-scale VAR generators, enhancing global coherence and local detail without altering training data, base architectures, or sampling schedules. Code is available at https://github.com/bijiw515/AID-VAR.

Problem

Research questions and friction points this paper is trying to address.

cascading error propagation

Visual Autoregressive generation

image synthesis

fidelity degradation

hierarchical prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial Error Correction

Visual Autoregressive Generation

Plug-and-Play Framework