Personalized Text-to-Image Generation with Auto-Regressive Models

๐Ÿ“… 2025-04-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work investigates the feasibility of autoregressive models as alternatives to dominant diffusion-based approaches for personalized text-to-image generation. We propose a two-stage training strategy: first optimizing text embeddings to align with the personalized subject, then fine-tuning Transformer layers for cross-modal joint modeling. Our method builds upon an autoregressive Transformer architecture that uniformly processes both text and image tokens, incorporating a dedicated mechanism for learning personalized concept embeddings. On standard benchmarks, it achieves subject consistency and prompt adherence on par with state-of-the-art diffusion-based methodsโ€”marking the first systematic validation of autoregressive paradigms for personalized generation. The core contribution lies in challenging the diffusion-dominated paradigm, demonstrating that autoregressive models can achieve high-fidelity subject modeling and flexible scene generalization, thereby opening a novel pathway for text-to-image synthesis.

Technology Category

Application Category

๐Ÿ“ Abstract
Personalized image synthesis has emerged as a pivotal application in text-to-image generation, enabling the creation of images featuring specific subjects in diverse contexts. While diffusion models have dominated this domain, auto-regressive models, with their unified architecture for text and image modeling, remain underexplored for personalized image generation. This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.
Problem

Research questions and friction points this paper is trying to address.

Exploring auto-regressive models for personalized image generation
Optimizing text embeddings and transformer layers for synthesis
Comparing performance with diffusion-based personalization methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto-regressive models for personalized image synthesis
Two-stage training strategy with text embeddings
Multimodal capabilities for subject fidelity
๐Ÿ”Ž Similar Papers
No similar papers found.