Personalized Text-to-Image Generation with Auto-Regressive Models

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work investigates the feasibility of autoregressive models as alternatives to dominant diffusion-based approaches for personalized text-to-image generation. We propose a two-stage training strategy: first optimizing text embeddings to align with the personalized subject, then fine-tuning Transformer layers for cross-modal joint modeling. Our method builds upon an autoregressive Transformer architecture that uniformly processes both text and image tokens, incorporating a dedicated mechanism for learning personalized concept embeddings. On standard benchmarks, it achieves subject consistency and prompt adherence on par with state-of-the-art diffusion-based methods—marking the first systematic validation of autoregressive paradigms for personalized generation. The core contribution lies in challenging the diffusion-dominated paradigm, demonstrating that autoregressive models can achieve high-fidelity subject modeling and flexible scene generalization, thereby opening a novel pathway for text-to-image synthesis.

Technology Category

Application Category

📝 Abstract

Personalized image synthesis has emerged as a pivotal application in text-to-image generation, enabling the creation of images featuring specific subjects in diverse contexts. While diffusion models have dominated this domain, auto-regressive models, with their unified architecture for text and image modeling, remain underexplored for personalized image generation. This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.

Problem

Research questions and friction points this paper is trying to address.

Exploring auto-regressive models for personalized image generation

Optimizing text embeddings and transformer layers for synthesis

Comparing performance with diffusion-based personalization methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto-regressive models for personalized image synthesis

Two-stage training strategy with text embeddings

Multimodal capabilities for subject fidelity

🔎 Similar Papers

No similar papers found.