PositionIC: Unified Position and Identity Consistency for Image Customization

๐Ÿ“… 2025-07-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing subject-driven image customization methods suffer from insufficient fine-grained, entity-level spatial control, primarily due to the lack of large-scale data that explicitly binds identity with precise positional cues. To address this, we propose PositionICโ€”the first framework enabling consistent modeling of position and identity in multi-subject image customization. Methodologically: (1) we design a bidirectional generative pipeline to synthesize high-quality training data with aligned position-identity pairs; (2) we introduce a lightweight position modulation layer that decouples and independently optimizes spatial embeddings and semantic representations for each subject; and (3) we adopt a scalable synthetic alternating training strategy. Experiments demonstrate that PositionIC significantly outperforms prior methods in multi-subject localization accuracy, identity fidelity, and layout flexibility. It establishes a controllable, high-fidelity unified solution for open-scenario image customization.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent subject-driven image customization has achieved significant advancements in fidelity, yet fine-grained entity-level spatial control remains elusive, hindering the broader real-world application. This limitation is mainly attributed to scalable datasets that bind identity with precise positional cues are absent. To this end, we introduce PositionIC, a unified framework that enforces position and identity consistency for multi-subject customization. We construct a scalable synthesis pipeline that employs a bidirectional generation paradigm to eliminate subject drift and maintain semantic coherence. On top of these data, we design a lightweight positional modulation layer that decouples spatial embeddings among subjects, enabling independent, accurate placement while preserving visual fidelity. Extensive experiments demonstrate that our approach can achieve precise spatial control while maintaining high consistency in image customization task. PositionIC paves the way for controllable, high-fidelity image customization in open-world, multi-entity scenarios and will be released to foster further research.
Problem

Research questions and friction points this paper is trying to address.

Achieve fine-grained spatial control in image customization
Maintain identity consistency in multi-subject customization
Enable precise placement without sacrificing visual fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for position and identity consistency
Bidirectional generation paradigm for semantic coherence
Lightweight positional modulation layer for spatial control
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Junjie Hu
MeiGen AI Team, Meituan
Tianyang Han
Tianyang Han
The Hong Kong Polytechnic University (PolyU)
Image generationMultimodal Large Language Model
K
Kai Ma
MeiGen AI Team, Meituan
Jialin Gao
Jialin Gao
National University of Singapore
Video Understanding Multi-modal Understanding
Hao Dou
Hao Dou
Institute of Automation, Chinese Academy of Sciences
Machine LearningImage Processing
S
Song Yang
MeiGen AI Team, Meituan
X
Xianhua He
MeiGen AI Team, Meituan
Jianhui Zhang
Jianhui Zhang
MeiGen AI Team, Meituan
J
Junfeng Luo
MeiGen AI Team, Meituan
Xiaoming Wei
Xiaoming Wei
Meituan
computer visionmachine learning
W
Wenqiang Zhang
Shanghai Key Lab of Intelligent Information Processing, College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China; College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China