InpaintDPO: Mitigating Spatial Relationship Hallucinations in Foreground-conditioned Inpainting via Diverse Preference Optimization

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

In foreground-guided image inpainting, hallucinations in spatial relationships—such as scale, position, and viewpoint—between foreground and background are difficult to quantify, impeding the application of conventional RLHF. To address this, we introduce Direct Preference Optimization (DPO) for spatial plausibility modeling—the first such effort. Our method features three innovations: (1) MaskDPO, which performs background-specific optimization via masked preference learning; (2) conditional asymmetric preference optimization to enhance foreground-background boundary consistency; and (3) shared commonality preference optimization to improve cross-scene generalization of spatial relationships. The framework integrates mask constraints, differential cropping-based sampling, global preference modeling, and common-feature distillation. Experiments demonstrate substantial mitigation of spatial distortion: FID improves by 12.3% and LPIPS by 18.7% across multiple benchmarks, while human evaluation yields a 92.4% acceptance rate for spatial plausibility.

Technology Category

Application Category

📝 Abstract

Foreground-conditioned inpainting, which aims at generating a harmonious background for a given foreground subject based on the text prompt, is an important subfield in controllable image generation. A common challenge in current methods, however, is the occurrence of Spatial Relationship Hallucinations between the foreground subject and the generated background, including inappropriate scale, positional relationships, and viewpoints. Critically, the subjective nature of spatial rationality makes it challenging to quantify, hindering the use of traditional reward-based RLHF methods. To address this issue, we propose InpaintDPO, the first Direct Preference Optimization (DPO) based framework dedicated to spatial rationality in foreground-conditioned inpainting, ensuring plausible spatial relationships between foreground and background elements. To resolve the gradient conflicts in standard DPO caused by identical foreground in win-lose pairs, we propose MaskDPO, which confines preference optimization exclusively to the background to enhance background spatial relationships, while retaining the inpainting loss in the foreground region for robust foreground preservation. To enhance coherence at the foreground-background boundary, we propose Conditional Asymmetric Preference Optimization, which samples pairs with differentiated cropping operations and applies global preference optimization to promote contextual awareness and enhance boundary coherence. Finally, based on the observation that winning samples share a commonality in plausible spatial relationships, we propose Shared Commonality Preference Optimization to enhance the model's understanding of spatial commonality across high-quality winning samples, further promoting shared spatial rationality.

Problem

Research questions and friction points this paper is trying to address.

Addresses spatial relationship hallucinations in foreground-conditioned image inpainting

Mitigates inappropriate scale, position, and viewpoint issues between foreground and background

Overcomes subjective spatial rationality challenges for preference-based optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

DPO framework for spatial rationality in inpainting

MaskDPO confines optimization to background region

Asymmetric and shared commonality preference optimization enhance coherence

🔎 Similar Papers

No similar papers found.