🤖 AI Summary
Image inpainting faces challenges in aligning missing regions with original content both spatially and semantically, often leading to local inconsistencies and visual artifacts. To address this, we propose the Context-Adaptive Discrepancy (CAD) model, which explicitly models context-aware distribution discrepancies between known and unknown regions to enable adaptive latent-space alignment. We further introduce a Dynamic Sampling mechanism that adjusts sampling step sizes according to local structural complexity, coupled with a local-feature-driven reconstruction strategy. Built upon a diffusion-based framework, CAD achieves significant improvements in detail fidelity and global naturalness on standard benchmarks including CelebA-HQ and Places2. Quantitative evaluations demonstrate consistent superiority over state-of-the-art methods across multiple metrics—e.g., FID, LPIPS, and PSNR—while qualitative results confirm enhanced structural coherence and texture realism. This work establishes a novel paradigm for high-fidelity image inpainting through principled discrepancy modeling and adaptive inference.
📝 Abstract
Image completion is a challenging task, particularly when ensuring that generated content seamlessly integrates with existing parts of an image. While recent diffusion models have shown promise, they often struggle with maintaining coherence between known and unknown (missing) regions. This issue arises from the lack of explicit spatial and semantic alignment during the diffusion process, resulting in content that does not smoothly integrate with the original image. Additionally, diffusion models typically rely on global learned distributions rather than localized features, leading to inconsistencies between the generated and existing image parts. In this work, we propose ConFill, a novel framework that introduces a Context-Adaptive Discrepancy (CAD) model to ensure that intermediate distributions of known and unknown regions are closely aligned throughout the diffusion process. By incorporating CAD, our model progressively reduces discrepancies between generated and original images at each diffusion step, leading to contextually aligned completion. Moreover, ConFill uses a new Dynamic Sampling mechanism that adaptively increases the sampling rate in regions with high reconstruction complexity. This approach enables precise adjustments, enhancing detail and integration in restored areas. Extensive experiments demonstrate that ConFill outperforms current methods, setting a new benchmark in image completion.