AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution

πŸ“… 2026-02-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses structural fragmentation in visual autoregressive models for image super-resolution, which arises from locally biased attention mechanisms and error accumulation across scales due to residual supervision, thereby compromising global consistency. To mitigate these issues, the authors propose the AlignVAR framework, integrating Spatial Consistency Autoregression (SCA) and Hierarchical Consistency Constraints (HCC). SCA employs adaptive mask-reweighted attention to alleviate local bias, while HCC replaces pure residual learning with multi-scale full-supervision reconstruction, enhancing long-range dependencies and stabilizing the coarse-to-fine generation process. The proposed method achieves a nearly 50% reduction in parameter count and over 10Γ— faster inference compared to prevailing diffusion models, significantly improving structural coherence and perceptual quality.

Technology Category

Application Category

πŸ“ Abstract
Visual autoregressive (VAR) models have recently emerged as a promising alternative for image generation, offering stable training, non-iterative inference, and high-fidelity synthesis through next-scale prediction. This encourages the exploration of VAR for image super-resolution (ISR), yet its application remains underexplored and faces two critical challenges: locality-biased attention, which fragments spatial structures, and residual-only supervision, which accumulates errors across scales, severely compromises global consistency of reconstructed images. To address these issues, we propose AlignVAR, a globally consistent visual autoregressive framework tailored for ISR, featuring two key components: (1) Spatial Consistency Autoregression (SCA), which applies an adaptive mask to reweight attention toward structurally correlated regions, thereby mitigating excessive locality and enhancing long-range dependencies; and (2) Hierarchical Consistency Constraint (HCC), which augments residual learning with full reconstruction supervision at each scale, exposing accumulated deviations early and stabilizing the coarse-to-fine refinement process. Extensive experiments demonstrate that AlignVAR consistently enhances structural coherence and perceptual fidelity over existing generative methods, while delivering over 10x faster inference with nearly 50% fewer parameters than leading diffusion-based approaches, establishing a new paradigm for efficient ISR.
Problem

Research questions and friction points this paper is trying to address.

visual autoregression
image super-resolution
global consistency
locality-biased attention
residual-only supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Autoregression
Image Super-Resolution
Spatial Consistency
Hierarchical Supervision
Global Coherence
πŸ”Ž Similar Papers
No similar papers found.
C
Cencen Liu
University of Electronic Science and Technology of China
Dongyang Zhang
Dongyang Zhang
University of Electronic Science and Technology of China
ε›Ύεƒε€εŽŸγ€θΆ…εˆ†θΎ¨ηŽ‡
W
Wen Yin
University of Electronic Science and Technology of China
J
Jielei Wang
University of Electronic Science and Technology of China, Ubiquitous Intelligence and Trusted Services Key Laboratory of Sichuan Province
T
Tianyu Li
University of Electronic Science and Technology of China
J
Ji Guo
University of Electronic Science and Technology of China
Wenbo Jiang
Wenbo Jiang
University of Electronic Science and Technology of China
AI securityBackdoor attack
Guoqing Wang
Guoqing Wang
University of Electronic Science and Technology of China
Computer VisionMachine LearningPattern RecognitionIntelligent System
G
Guoming Lu
University of Electronic Science and Technology of China, Ubiquitous Intelligence and Trusted Services Key Laboratory of Sichuan Province