🤖 AI Summary
Current object removal methods often introduce artifacts or spurious objects in masked regions due to the absence of paired training data and difficulties in constraining generated content. This work proposes YOEO, the first approach to achieve high-quality object erasure using only unpaired real images. YOEO integrates an entity segmentation-based clutter detector with a context consistency loss to guide a diffusion model in producing semantically plausible and structurally coherent inpainted content. Furthermore, it incorporates a diffusion distillation strategy to significantly accelerate inference. Experimental results demonstrate that YOEO outperforms state-of-the-art methods across multiple metrics, effectively preventing unintended content generation while preserving strong contextual consistency and visual realism.
📝 Abstract
We present YOEO, an approach for object erasure. Unlike recent diffusion-based methods which struggle to erase target objects without generating unexpected content within the masked regions due to lack of sufficient paired training data and explicit constraint on content generation, our method allows to produce high-quality object erasure results free of unwanted objects or artifacts while faithfully preserving the overall context coherence to the surrounding content. We achieve this goal by training an object erasure diffusion model on unpaired data containing only large-scale real-world images, under the supervision of a sundries detector and a context coherence loss that are built upon an entity segmentation model. To enable more efficient training and inference, a diffusion distillation strategy is employed to train for a few-step erasure diffusion model. Extensive experiments show that our method outperforms the state-of-the-art object erasure methods. Code will be available at https://zyxunh.github.io/YOEO-ProjectPage/.