🤖 AI Summary
Existing personalized text-to-image generation methods rely on full fine-tuning or adapter-based approaches, suffering from high parameter overhead, overfitting, and catastrophic forgetting. This paper proposes a lightweight, tuning-free concept injection framework that embeds subject- and style-specific concepts into a unified autoregressive model without updating any pretrained parameters. Our approach features: (1) layered multimodal in-context learning with all backbone parameters frozen; (2) context token anchoring and distribution-preserving regularization to ensure semantic consistency; and (3) high-fidelity personalized generation using only 0.05% trainable parameters. Evaluated on subject-driven generation and style transfer tasks, our method matches Proxy-Tuning in performance while significantly improving computational and memory efficiency. Moreover, it enables zero-shot user-specific style transfer—demonstrating strong generalization without task-specific adaptation.
📝 Abstract
The unified autoregressive (AR) model excels at multimodal understanding and generation, but its potential for customized image generation remains underexplored. Existing customized generation methods rely on full fine-tuning or adapters, making them costly and prone to overfitting or catastrophic forgetting. In this paper, we propose extbf{CoAR}, a novel framework for injecting subject concepts into the unified AR models while keeping all pre-trained parameters completely frozen. CoAR learns effective, specific subject representations with only a minimal number of parameters using a Layerwise Multimodal Context Learning strategy. To address overfitting and language drift, we further introduce regularization that preserves the pre-trained distribution and anchors context tokens to improve subject fidelity and re-contextualization. Additionally, CoAR supports training-free subject customization in a user-provided style. Experiments demonstrate that CoAR achieves superior performance on both subject-driven personalization and style personalization, while delivering significant gains in computational and memory efficiency. Notably, CoAR tunes less than extbf{0.05%} of the parameters while achieving competitive performance compared to recent Proxy-Tuning. Code: https://github.com/KZF-kzf/CoAR