PositionIC: Unified Position and Identity Consistency for Image Customization

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing subject-driven image customization methods suffer from insufficient fine-grained, entity-level spatial control, primarily due to the lack of large-scale data that explicitly binds identity with precise positional cues. To address this, we propose PositionIC—the first framework enabling consistent modeling of position and identity in multi-subject image customization. Methodologically: (1) we design a bidirectional generative pipeline to synthesize high-quality training data with aligned position-identity pairs; (2) we introduce a lightweight position modulation layer that decouples and independently optimizes spatial embeddings and semantic representations for each subject; and (3) we adopt a scalable synthetic alternating training strategy. Experiments demonstrate that PositionIC significantly outperforms prior methods in multi-subject localization accuracy, identity fidelity, and layout flexibility. It establishes a controllable, high-fidelity unified solution for open-scenario image customization.

Technology Category

Application Category

📝 Abstract

Recent subject-driven image customization has achieved significant advancements in fidelity, yet fine-grained entity-level spatial control remains elusive, hindering the broader real-world application. This limitation is mainly attributed to scalable datasets that bind identity with precise positional cues are absent. To this end, we introduce PositionIC, a unified framework that enforces position and identity consistency for multi-subject customization. We construct a scalable synthesis pipeline that employs a bidirectional generation paradigm to eliminate subject drift and maintain semantic coherence. On top of these data, we design a lightweight positional modulation layer that decouples spatial embeddings among subjects, enabling independent, accurate placement while preserving visual fidelity. Extensive experiments demonstrate that our approach can achieve precise spatial control while maintaining high consistency in image customization task. PositionIC paves the way for controllable, high-fidelity image customization in open-world, multi-entity scenarios and will be released to foster further research.

Problem

Research questions and friction points this paper is trying to address.

Achieve fine-grained spatial control in image customization

Maintain identity consistency in multi-subject customization

Enable precise placement without sacrificing visual fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for position and identity consistency

Bidirectional generation paradigm for semantic coherence

Lightweight positional modulation layer for spatial control

🔎 Similar Papers

GroundingBooth: Grounding Text-to-Image Customization

2024-09-13arXiv.orgCitations: 3

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)