🤖 AI Summary
Existing image object removal methods suffer from inaccurate context modeling due to direct occlusion of target regions, leading to severe artifacts—especially in scenes with complex textures and overlapping objects. To address this, we propose a novel “masked-region-guided” paradigm: instead of treating the user-provided coarse mask as an occlusion input, we leverage it as a structured semantic guidance signal, significantly improving localization accuracy and semantic consistency. Methodologically, we introduce Syn4Removal—the first large-scale synthetic dataset tailored for object removal—generated via instance-segmentation-driven copy-paste synthesis. We further design an end-to-end network integrating mask-guided conditioning with a context-aware diffusion/generative inpainting architecture. Experiments demonstrate that our approach achieves state-of-the-art performance across PSNR, LPIPS, and user studies, exhibiting superior structural integrity and visual naturalness in multi-object interactions and high-texture regions.
📝 Abstract
Object removal has so far been dominated by the mask-and-inpaint paradigm, where the masked region is excluded from the input, leaving models relying on unmasked areas to inpaint the missing region. However, this approach lacks contextual information for the masked area, often resulting in unstable performance. In this work, we introduce SmartEraser, built with a new removing paradigm called Masked-Region Guidance. This paradigm retains the masked region in the input, using it as guidance for the removal process. It offers several distinct advantages: (a) it guides the model to accurately identify the object to be removed, preventing its regeneration in the output; (b) since the user mask often extends beyond the object itself, it aids in preserving the surrounding context in the final result. Leveraging this new paradigm, we present Syn4Removal, a large-scale object removal dataset, where instance segmentation data is used to copy and paste objects onto images as removal targets, with the original images serving as ground truths. Experimental results demonstrate that SmartEraser significantly outperforms existing methods, achieving superior performance in object removal, especially in complex scenes with intricate compositions.