CoAR: Concept Injection into Autoregressive Models for Personalized Text-to-Image Generation

📅 2025-08-10

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Existing personalized text-to-image generation methods rely on full fine-tuning or adapter-based approaches, suffering from high parameter overhead, overfitting, and catastrophic forgetting. This paper proposes a lightweight, tuning-free concept injection framework that embeds subject- and style-specific concepts into a unified autoregressive model without updating any pretrained parameters. Our approach features: (1) layered multimodal in-context learning with all backbone parameters frozen; (2) context token anchoring and distribution-preserving regularization to ensure semantic consistency; and (3) high-fidelity personalized generation using only 0.05% trainable parameters. Evaluated on subject-driven generation and style transfer tasks, our method matches Proxy-Tuning in performance while significantly improving computational and memory efficiency. Moreover, it enables zero-shot user-specific style transfer—demonstrating strong generalization without task-specific adaptation.

Technology Category

Application Category

📝 Abstract

The unified autoregressive (AR) model excels at multimodal understanding and generation, but its potential for customized image generation remains underexplored. Existing customized generation methods rely on full fine-tuning or adapters, making them costly and prone to overfitting or catastrophic forgetting. In this paper, we propose extbf{CoAR}, a novel framework for injecting subject concepts into the unified AR models while keeping all pre-trained parameters completely frozen. CoAR learns effective, specific subject representations with only a minimal number of parameters using a Layerwise Multimodal Context Learning strategy. To address overfitting and language drift, we further introduce regularization that preserves the pre-trained distribution and anchors context tokens to improve subject fidelity and re-contextualization. Additionally, CoAR supports training-free subject customization in a user-provided style. Experiments demonstrate that CoAR achieves superior performance on both subject-driven personalization and style personalization, while delivering significant gains in computational and memory efficiency. Notably, CoAR tunes less than extbf{0.05%} of the parameters while achieving competitive performance compared to recent Proxy-Tuning. Code: https://github.com/KZF-kzf/CoAR

Problem

Research questions and friction points this paper is trying to address.

Injecting subject concepts into AR models without fine-tuning

Preventing overfitting and language drift in personalized generation

Achieving efficient customization with minimal parameter updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frozen pre-trained parameters for concept injection

Layerwise Multimodal Context Learning strategy

Regularization to prevent overfitting and language drift

🔎 Similar Papers

No similar papers found.