🤖 AI Summary
Diffusion-based image inpainting and extension often suffer from boundary artifacts and unnatural fusion due to color mismatch and structural discontinuity across the mask boundary. To address this, we propose a two-stage collaborative optimization framework. First, we design a color-bias-correcting variational autoencoder (VAE) that explicitly models and compensates for cross-boundary chromatic shifts. Second, we introduce an appearance-structure disentangled two-stage diffusion training paradigm: pixel-level appearance alignment is enforced in the image space, while geometric consistency is preserved via latent-space constraints. The framework enables end-to-end joint optimization, significantly reducing both pixel-wise L1 error and perceptual discontinuities. Evaluated on benchmarks including Places2 and COCO-Stuff, our method achieves state-of-the-art visual quality—producing seamless, semantically coherent results with imperceptible boundaries and no visible stitching artifacts.
📝 Abstract
Image inpainting is the task of reconstructing missing or damaged parts of an image in a way that seamlessly blends with the surrounding content. With the advent of advanced generative models, especially diffusion models and generative adversarial networks, inpainting has achieved remarkable improvements in visual quality and coherence. However, achieving seamless continuity remains a significant challenge. In this work, we propose two novel methods to address discrepancy issues in diffusion-based inpainting models. First, we introduce a modified Variational Autoencoder that corrects color imbalances, ensuring that the final inpainted results are free of color mismatches. Second, we propose a two-step training strategy that improves the blending of generated and existing image content during the diffusion process. Through extensive experiments, we demonstrate that our methods effectively reduce discontinuity and produce high-quality inpainting results that are coherent and visually appealing.