A Diffusion-Based Framework for Occluded Object Movement

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenging problem of seamless object relocation in real-world images with occlusions—requiring both occlusion removal and precise object repositioning—this paper introduces the first occlusion-aware, end-to-end diffusion-based editing framework. Our method employs a dual-branch parallel architecture: one branch performs mask-guided de-occlusion reconstruction, while the other achieves accurate object localization via localized text guidance and latent-space optimization. The framework integrates background-color filling, dynamic mask updating, and fine-tuned pre-trained diffusion models. Quantitative evaluations and user studies across diverse, complex occlusion scenarios demonstrate significant improvements over state-of-the-art methods. Our approach achieves new state-of-the-art performance in visual realism, geometric consistency, and semantic plausibility, enabling high-fidelity, semantically coherent object editing under heavy occlusion.

Technology Category

Application Category

📝 Abstract
Seamlessly moving objects within a scene is a common requirement for image editing, but it is still a challenge for existing editing methods. Especially for real-world images, the occlusion situation further increases the difficulty. The main difficulty is that the occluded portion needs to be completed before movement can proceed. To leverage the real-world knowledge embedded in the pre-trained diffusion models, we propose a Diffusion-based framework specifically designed for Occluded Object Movement, named DiffOOM. The proposed DiffOOM consists of two parallel branches that perform object de-occlusion and movement simultaneously. The de-occlusion branch utilizes a background color-fill strategy and a continuously updated object mask to focus the diffusion process on completing the obscured portion of the target object. Concurrently, the movement branch employs latent optimization to place the completed object in the target location and adopts local text-conditioned guidance to integrate the object into new surroundings appropriately. Extensive evaluations demonstrate the superior performance of our method, which is further validated by a comprehensive user study.
Problem

Research questions and friction points this paper is trying to address.

Handling occluded object movement in image editing
Completing occluded portions before object relocation
Integrating moved objects naturally into new surroundings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based framework for occluded object movement
Parallel branches for de-occlusion and movement
Latent optimization and local text-conditioned guidance
🔎 Similar Papers
No similar papers found.
Zheng-Peng Duan
Zheng-Peng Duan
Nankai University
Computer Vision
J
Jiawei Zhang
SenseTime Research
S
Siyu Liu
VCIP, CS, Nankai University
Z
Zheng Lin
BNRist, Department of Computer Science and Technology, Tsinghua University
C
Chun-Le Guo
VCIP, CS, Nankai University
D
Dongqing Zou
NKIARI, Shenzhen Futian
J
Jimmy Ren
SenseTime Research
Chongyi Li
Chongyi Li
Professor, Nankai University
Computer VisionComputational ImagingComputational PhotographyUnderwater Imaging