🤖 AI Summary
This work addresses the problem of zero-shot, customized object insertion for seamless image editing—requiring no additional training. We propose a training-free, high-fidelity cross-view editing framework. Methodologically, we introduce a novel positional embedding transfer mechanism and a corner-centered layout strategy, integrated with the progressive denoising process of FLUX.1-Fill to achieve precise alignment of the reference object’s position, structure, and appearance—and to guide noise distribution—in the target scene. Our key contribution is the first zero-shot, identity-aware, cross-view editing framework with explicit positional guidance, preserving geometric consistency while significantly improving appearance fidelity and localization accuracy. Experiments demonstrate superior performance over state-of-the-art methods in structural coherence, visual realism, and inference efficiency, with strong generalization capability and practical applicability.
📝 Abstract
Localized subject-driven image editing aims to seamlessly integrate user-specified objects into target scenes. As generative models continue to scale, training becomes increasingly costly in terms of memory and computation, highlighting the need for training-free and scalable editing frameworks.To this end, we propose PosBridge an efficient and flexible framework for inserting custom objects. A key component of our method is positional embedding transplant, which guides the diffusion model to faithfully replicate the structural characteristics of reference objects.Meanwhile, we introduce the Corner Centered Layout, which concatenates reference images and the background image as input to the FLUX.1-Fill model. During progressive denoising, positional embedding transplant is applied to guide the noise distribution in the target region toward that of the reference object. In this way, Corner Centered Layout effectively directs the FLUX.1-Fill model to synthesize identity-consistent content at the desired location. Extensive experiments demonstrate that PosBridge outperforms mainstream baselines in structural consistency, appearance fidelity, and computational efficiency, showcasing its practical value and potential for broad adoption.