OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reference-image-guided diffusion models suffer from fine-grained texture degradation due to VAE latent-space compression, weakening identity and attribute cues; subsequent local post-editing often introduces inconsistencies in illumination, texture, or shape. Method: We propose a two-stage refinement framework: (1) global structural fidelity preservation, followed by (2) reinforcement learning–driven local diffusion editing that jointly optimizes fine-grained texture restoration and semantic consistency—thereby overcoming the VAE compression bottleneck. Our method fine-tunes a single-image diffusion editor using both reference images and sketch inputs, and introduces a custom reward function to guide detail accuracy and contextual coherence. Contribution/Results: Extensive evaluation demonstrates significant improvements over both open-source and commercial baselines across multiple benchmarks, achieving state-of-the-art performance in reference alignment, texture fidelity, and local consistency.

Technology Category

Application Category

📝 Abstract
Reference-guided image generation has progressed rapidly, yet current diffusion models still struggle to preserve fine-grained visual details when refining a generated image using a reference. This limitation arises because VAE-based latent compression inherently discards subtle texture information, causing identity- and attribute-specific cues to vanish. Moreover, post-editing approaches that amplify local details based on existing methods often produce results inconsistent with the original image in terms of lighting, texture, or shape. To address this, we introduce ourMthd{}, a detail-aware refinement framework that performs two consecutive stages of reference-driven correction to enhance pixel-level consistency. We first adapt a single-image diffusion editor by fine-tuning it to jointly ingest the draft image and the reference image, enabling globally coherent refinement while maintaining structural fidelity. We then apply reinforcement learning to further strengthen localized editing capability, explicitly optimizing for detail accuracy and semantic consistency. Extensive experiments demonstrate that ourMthd{} significantly improves reference alignment and fine-grained detail preservation, producing faithful and visually coherent edits that surpass both open-source and commercial models on challenging reference-guided restoration benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Preserving fine-grained visual details in reference-guided image refinement
Addressing texture loss from VAE-based latent compression in diffusion models
Resolving lighting and shape inconsistencies in post-editing approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage reference-driven correction for pixel consistency
Fine-tuned diffusion editor ingests draft and reference images
Reinforcement learning optimizes detail accuracy and semantic consistency
🔎 Similar Papers
2024-06-04International Conference on Machine LearningCitations: 0
2023-09-20IEEE transactions on circuits and systems for video technology (Print)Citations: 0