A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model

📅 2025-04-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
To address two key bottlenecks in diffusion-based text-to-image (T2I) generation—style inconsistency and slow inference—this paper proposes a training-free, three-stage style alignment method grounded in scaled autoregressive modeling within the diffusion latent space. The method sequentially performs: (1) initial feature replacement to anchor semantic content, (2) critical feature interpolation to preserve layout controllability, and (3) dynamic style injection guided by a scheduling function to ensure cross-image style coherence. Crucially, it avoids fine-tuning or auxiliary networks, thereby substantially improving inference efficiency. Experiments demonstrate that our approach achieves superior style consistency compared to existing training-free methods, while maintaining high content fidelity; generated image quality is on par with leading fine-tuned approaches, and inference speed exceeds the fastest baseline by over 6×.

Technology Category

Application Category

📝 Abstract
We present a training-free style-aligned image generation method that leverages a scale-wise autoregressive model. While large-scale text-to-image (T2I) models, particularly diffusion-based methods, have demonstrated impressive generation quality, they often suffer from style misalignment across generated image sets and slow inference speeds, limiting their practical usability. To address these issues, we propose three key components: initial feature replacement to ensure consistent background appearance, pivotal feature interpolation to align object placement, and dynamic style injection, which reinforces style consistency using a schedule function. Unlike previous methods requiring fine-tuning or additional training, our approach maintains fast inference while preserving individual content details. Extensive experiments show that our method achieves generation quality comparable to competing approaches, significantly improves style alignment, and delivers inference speeds over six times faster than the fastest model.
Problem

Research questions and friction points this paper is trying to address.

Addresses style misalignment in generated image sets
Improves slow inference speeds in text-to-image models
Ensures style consistency without requiring fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free style-aligned image generation method
Scale-wise autoregressive model for fast inference
Dynamic style injection with schedule function
🔎 Similar Papers
No similar papers found.