Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing methods rely on holistic embeddings from generic image encoders, leading to strong entanglement among visual attributes—such as identity, expression, illumination, and style—causing information leakage and synthesis inconsistency. To address this, we propose the first open-vocabulary image attribute encoder that explicitly decouples fine-grained visual concepts. We introduce a novel semantic pairing strategy for constructing attribute-annotated training data and design a joint training paradigm optimizing both generative fidelity and contrastive disentanglement, incorporating positive/negative attribute pairs and a dedicated contrastive disentanglement loss. Our method achieves state-of-the-art performance on open-vocabulary attribute retrieval, personalized image editing, and compositional generation. It significantly improves attribute isolation and synthesis consistency, establishing a new interpretable and controllable paradigm for personalized visual concept modeling.

Technology Category

Application Category

📝 Abstract

Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However, existing methods rely on holistic embeddings from general-purpose image encoders, which entangle multiple visual factors and make it difficult to isolate a single attribute. This often leads to information leakage and incoherent synthesis. To address this limitation, we introduce Omni-Attribute, the first open-vocabulary image attribute encoder designed to learn high-fidelity, attribute-specific representations. Our approach jointly designs the data and model: (i) we curate semantically linked image pairs annotated with positive and negative attributes to explicitly teach the encoder what to preserve or suppress; and (ii) we adopt a dual-objective training paradigm that balances generative fidelity with contrastive disentanglement. The resulting embeddings prove effective for open-vocabulary attribute retrieval, personalization, and compositional generation, achieving state-of-the-art performance across multiple benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Isolates specific image attributes for transfer

Prevents attribute entanglement and information leakage

Enables open-vocabulary attribute control in generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-vocabulary encoder learns attribute-specific visual representations

Uses semantically linked image pairs with positive-negative annotations

Dual-objective training balances generative fidelity and contrastive disentanglement

🔎 Similar Papers

No similar papers found.