🤖 AI Summary
Existing image synthesis methods struggle to simultaneously achieve high realism—ensuring semantic and pose consistency between foreground and background—and high fidelity in preserving foreground details. This work proposes a novel two-stage generative framework that decouples these objectives for the first time: the first stage synthesizes a foreground deformation compatible with the background to guarantee realism, while the second stage refines this result to reconstruct high-fidelity details. Evaluated on the MureCOM dataset, the proposed approach significantly outperforms current single-stage methods, achieving both natural scene integration and faithful detail preservation. The code and models are publicly released.
📝 Abstract
Generative image composition aims to regenerate the given foreground object in the background image to produce a realistic composite image. Some high-authenticity methods can adjust foreground pose/view to be compatible with background, while some high-fidelity methods can preserve the foreground details accurately. However, existing methods can hardly achieve both goals at the same time. In this work, we propose a two-stage strategy to achieve both goals. In the first stage, we use high-authenticity method to generate reasonable foreground shape, serving as the condition of high-fidelity method in the second stage. The experiments on MureCOM dataset verify the effectiveness of our two-stage strategy. The code and model have been released at https://github.com/bcmi/OSInsert-Image-Composition.