A Training-Free Style-Personalization via Scale-wise Autoregressive Model

📅 2025-07-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of personalized style control and content-style disentanglement in image stylization without additional training. We propose an inference-time stylization method built upon pretrained diffusion models, featuring a three-branch prompt-guidance mechanism and a scale-autoregressive generation paradigm to explicitly decouple content and style during inference. To reveal the dominant roles of early-to-mid generation stages in structuring content and encoding style, we introduce key-stage attention sharing and adaptive query sharing. Fine-grained collaborative control is achieved via step-level and attention-level interventions, joint prompt-feature injection, and query similarity fusion. Experiments demonstrate that our approach matches fine-tuning methods in style fidelity and prompt alignment, achieves significantly faster inference, and exhibits strong cross-style generalization and deployment flexibility.

Technology Category

Application Category

📝 Abstract
We present a training-free framework for style-personalized image generation that controls content and style information during inference using a scale-wise autoregressive model. Our method employs a three-path design--content, style, and generation--each guided by a corresponding text prompt, enabling flexible and efficient control over image semantics without any additional training. A central contribution of this work is a step-wise and attention-wise intervention analysis. Through systematic prompt and feature injection, we find that early-to-middle generation steps play a pivotal role in shaping both content and style, and that query features predominantly encode content-specific information. Guided by these insights, we introduce two targeted mechanisms: Key Stage Attention Sharing, which aligns content and style during the semantically critical steps, and Adaptive Query Sharing, which reinforces content semantics in later steps through similarity-aware query blending. Extensive experiments demonstrate that our method achieves competitive style fidelity and prompt fidelity compared to fine-tuned baselines, while offering faster inference and greater deployment flexibility.
Problem

Research questions and friction points this paper is trying to address.

Training-free style-personalized image generation control
Step-wise intervention analysis for content and style shaping
Key mechanisms for competitive fidelity without fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free style-personalization via autoregressive model
Key Stage Attention Sharing for content-style alignment
Adaptive Query Sharing for content semantics reinforcement
🔎 Similar Papers
No similar papers found.
K
Kyoungmin Lee
DGIST, Republic of Korea
Jihun Park
Jihun Park
DGIST, Republic of Korea
J
Jongmin Gim
DGIST, Republic of Korea
W
Wonhyeok Choi
DGIST, Republic of Korea
K
Kyumin Hwang
DGIST, Republic of Korea
J
Jaeyeul Kim
DGIST, Republic of Korea
Sunghoon Im
Sunghoon Im
EECS, DGIST
Computer VisionDeep LearningRobot Vision