Gate-and-Merge: Zero-shot Compositional Personalization of Vision Language Models

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenge of enabling vision-language models to compose multiple user-defined concepts for personalized recognition or description at test time, despite the absence of co-occurring training data. To this end, the authors propose the Gate-and-Merge framework, which learns lightweight LoRA adapters and dedicated concept tokens for each concept independently under a zero-shot setting. During inference, it merges LoRA updates in weight space and employs a gating mechanism to dynamically select relevant modules while suppressing interference. This approach achieves, for the first time, compositional personalization without requiring co-occurrence training, and incorporates a consistency-aware merging strategy to preserve concept disentanglement and enhance compositional stability. Experiments demonstrate significant improvements over baselines on both single-concept and compositional tasks, with quantitative and qualitative results validating its effectiveness.

📝 Abstract

This paper tackles compositional personalization of vision-language models (VLMs). In this problem, multiple user-defined concepts must be recognized or described jointly at test time. We introduce Gate-and-Merge, a zero-shot framework that enables compositional personalization without the need for co-occurrence training. During personalization, each concept is learned independently as a lightweight LoRA adapter, paired with a concept token. The base model remains unchanged and concepts are kept disentangled. At inference, we enable composition by merging concept-specific LoRA updates directly in weight space. To suppress irrelevant activations and prevent interference, a gating mechanism is employed to estimate textual and visual cues and select only the modules that contribute to the prediction. We further stabilize composition by combining only the most meaningful and mutually consistent updates, helping preserve each concept's identity. Our quantitative and qualitative analyses show consistent gains in performance across multiple personalization tasks in both single-concept and compositional settings.

Problem

Research questions and friction points this paper is trying to address.

compositional personalization

vision-language models

zero-shot

concept composition

personalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional personalization

zero-shot learning

LoRA merging