๐ค AI Summary
Existing subject-driven image customization methods suffer from insufficient fine-grained, entity-level spatial control, primarily due to the lack of large-scale data that explicitly binds identity with precise positional cues. To address this, we propose PositionICโthe first framework enabling consistent modeling of position and identity in multi-subject image customization. Methodologically: (1) we design a bidirectional generative pipeline to synthesize high-quality training data with aligned position-identity pairs; (2) we introduce a lightweight position modulation layer that decouples and independently optimizes spatial embeddings and semantic representations for each subject; and (3) we adopt a scalable synthetic alternating training strategy. Experiments demonstrate that PositionIC significantly outperforms prior methods in multi-subject localization accuracy, identity fidelity, and layout flexibility. It establishes a controllable, high-fidelity unified solution for open-scenario image customization.
๐ Abstract
Recent subject-driven image customization has achieved significant advancements in fidelity, yet fine-grained entity-level spatial control remains elusive, hindering the broader real-world application. This limitation is mainly attributed to scalable datasets that bind identity with precise positional cues are absent. To this end, we introduce PositionIC, a unified framework that enforces position and identity consistency for multi-subject customization. We construct a scalable synthesis pipeline that employs a bidirectional generation paradigm to eliminate subject drift and maintain semantic coherence. On top of these data, we design a lightweight positional modulation layer that decouples spatial embeddings among subjects, enabling independent, accurate placement while preserving visual fidelity. Extensive experiments demonstrate that our approach can achieve precise spatial control while maintaining high consistency in image customization task. PositionIC paves the way for controllable, high-fidelity image customization in open-world, multi-entity scenarios and will be released to foster further research.