Dynamic Mixture-of-Experts for Visual Autoregressive Model

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Visual autoregressive models (VARs) suffer from significant computational redundancy in multi-scale image generation due to repeated invocation of full-parameter Transformers across scales. Method: We propose a dynamic Mixture-of-Experts (MoE) routing mechanism with a scale-aware thresholding strategy, which adaptively activates expert subnetworks based on token-level semantic complexity and current spatial resolution—requiring no additional training and enabling fine-grained computation allocation. Contribution/Results: Our approach is the first to jointly model dynamic sparsification and multi-scale generation while preserving the standard Transformer backbone. It substantially alleviates redundancy during high-resolution stages. Experiments on standard image generation benchmarks show a 20% reduction in FLOPs and an 11% speedup in inference latency, while maintaining image quality—measured by FID and LPIPS—comparable to dense baseline models. This achieves an effective trade-off between computational efficiency and generative fidelity.

Technology Category

Application Category

📝 Abstract

Visual Autoregressive Models (VAR) offer efficient and high-quality image generation but suffer from computational redundancy due to repeated Transformer calls at increasing resolutions. We introduce a dynamic Mixture-of-Experts router integrated into VAR. The new architecture allows to trade compute for quality through scale-aware thresholding. This thresholding strategy balances expert selection based on token complexity and resolution, without requiring additional training. As a result, we achieve 20% fewer FLOPs, 11% faster inference and match the image quality achieved by the dense baseline.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational redundancy in Visual Autoregressive Models

Introduces dynamic Mixture-of-Experts for efficient image generation

Balances expert selection through scale-aware thresholding strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Mixture-of-Experts router integrated into VAR

Scale-aware thresholding balances expert selection automatically

Reduces FLOPs by 20% while maintaining image quality

🔎 Similar Papers

Non-autoregressive Sequence-to-Sequence Vision-Language Models