Per-Query Visual Concept Learning

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing text-to-image personalization methods struggle to simultaneously achieve concept fidelity and semantic consistency under specific prompts and noise seeds. To address this, we propose a novel query-level fine-grained concept learning framework that jointly optimizes self-attention and cross-attention via a dual-loss mechanism, guided by Prompt-Diffusion Matching (PDM) features to explicitly model identity characteristics of novel visual concepts. Our method is architecture-agnostic—compatible with both UNet and DiT backbones—and supports end-to-end diffusion model fine-tuning. We conduct comprehensive evaluations across six state-of-the-art baselines and multiple foundational models. Results demonstrate significant improvements over existing per-query personalization approaches in generation quality, concept accuracy, and cross-prompt generalization. This advancement enables more robust and controllable generation for applications such as personalized design and product embedding.

Technology Category

Application Category

📝 Abstract

Visual concept learning, also known as Text-to-image personalization, is the process of teaching new concepts to a pretrained model. This has numerous applications from product placement to entertainment and personalized design. Here we show that many existing methods can be substantially augmented by adding a personalization step that is (1) specific to the prompt and noise seed, and (2) using two loss terms based on the self- and cross- attention, capturing the identity of the personalized concept. Specifically, we leverage PDM features - previously designed to capture identity - and show how they can be used to improve personalized semantic similarity. We evaluate the benefit that our method gains on top of six different personalization methods, and several base text-to-image models (both UNet- and DiT-based). We find significant improvements even over previous per-query personalization methods.

Problem

Research questions and friction points this paper is trying to address.

Enhancing visual concept learning for text-to-image personalization

Improving personalized semantic similarity using attention-based losses

Augmenting existing methods with per-query specific personalization steps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Per-query prompt-specific personalization step

Self- and cross-attention loss terms

PDM features for semantic similarity

🔎 Similar Papers

Pre-trained Vision-Language Models Learn Discoverable Visual Concepts