Towards Source-Aware Object Swapping with Initial Noise Perturbation

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work proposes SourceSwap, a novel framework for zero-shot object swapping that overcomes key limitations of existing methods—such as per-object fine-tuning, slow inference, or reliance on additional paired data—by enabling cross-object alignment and scene harmonization without video or multi-view inputs. SourceSwap introduces a self-supervised, source-aware alignment mechanism that generates high-quality pseudo-paired data in the initial noise space through frequency-separated perturbations. Leveraging a dual U-Net architecture, full-source conditioning, and a noise-free reference encoder, the method significantly outperforms current approaches in object fidelity, scene preservation, and object-scene harmony. Furthermore, it generalizes effectively to theme-driven refinement and face swapping tasks. To support rigorous evaluation, the authors also introduce SourceBench, the first high-quality benchmark dedicated to object swapping.

Technology Category

Application Category

📝 Abstract

Object swapping aims to replace a source object in a scene with a reference object while preserving object fidelity, scene fidelity, and object-scene harmony. Existing methods either require per-object finetuning and slow inference or rely on extra paired data that mostly depict the same object across contexts, forcing models to rely on background cues rather than learning cross-object alignment. We propose SourceSwap, a self-supervised and source-aware framework that learns cross-object alignment. Our key insight is to synthesize high-quality pseudo pairs from any image via a frequency-separated perturbation in the initial-noise space, which alters appearance while preserving pose, coarse shape, and scene layout, requiring no videos, multi-view data, or additional images. We then train a dual U-Net with full-source conditioning and a noise-free reference encoder, enabling direct inter-object alignment, zero-shot inference without per-object finetuning, and lightweight iterative refinement. We further introduce SourceBench, a high-quality benchmark with higher resolution, more categories, and richer interactions. Experiments demonstrate that SourceSwap achieves superior fidelity, stronger scene preservation, and more natural harmony, and it transfers well to edits such as subject-driven refinement and face swapping.

Problem

Research questions and friction points this paper is trying to address.

object swapping

cross-object alignment

source-aware

scene fidelity

object-scene harmony

Innovation

Methods, ideas, or system contributions that make the work stand out.

source-aware object swapping

initial noise perturbation

cross-object alignment

self-supervised generation

dual U-Net

🔎 Similar Papers

No similar papers found.

Bosch Group

Hildesheim, NDS, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)